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Preface 


This  book  is  about  constrained  optimization.  It  begins  with  a  thorough  treat¬ 
ment  of  linear  programming  and  proceeds  to  convex  analysis,  network  flows,  integer 
programming,  quadratic  programming,  and  convex  optimization.  Along  the  way, 
dynamic  programming  and  the  linear  complementarity  problem  are  touched  on  as 
well. 

The  book  aims  to  be  a  first  introduction  to  the  subject.  Specific  examples  and 
concrete  algorithms  precede  more  abstract  topics.  Nevertheless,  topics  covered  are 
developed  in  some  depth,  a  large  number  of  numerical  examples  are  worked  out 
in  detail,  and  many  recent  topics  are  included,  most  notably  interior-point  methods. 
The  exercises  at  the  end  of  each  chapter  both  illustrate  the  theory  and,  in  some  cases, 
extend  it. 

Prerequisites.  The  book  is  divided  into  four  parts.  The  first  two  parts  assume 
a  background  only  in  linear  algebra.  For  the  last  two  parts,  some  knowledge  of 
multivariate  calculus  is  necessary.  In  particular,  the  student  should  know  how  to  use 
Lagrange  multipliers  to  solve  simple  calculus  problems  in  2  and  3  dimensions. 

Associated  software.  It  is  good  to  be  able  to  solve  small  problems  by  hand, 
but  the  problems  one  encounters  in  practice  are  large,  requiring  a  computer  for  their 
solution.  Therefore,  to  fully  appreciate  the  subject,  one  needs  to  solve  large  (prac¬ 
tical)  problems  on  a  computer.  An  important  feature  of  this  book  is  that  it  comes 
with  software  implementing  the  major  algorithms  described  herein.  At  the  time  of 
writing,  software  for  the  following  five  algorithms  is  available: 

•  The  two-phase  simplex  method  as  shown  in  Figure  6.1. 

•  The  self-dual  simplex  method  as  shown  in  Figure  7.1. 

•  The  path-following  method  as  shown  in  Figure  18.1. 

•  The  homogeneous  self-dual  method  as  shown  in  Figure  22.1. 

•  The  long-step  homogeneous  self-dual  method  as  described  in  Exercise 
22.4. 

The  programs  that  implement  these  algorithms  are  written  in  C  and  can  be 
easily  compiled  on  most  hardware  platforms.  Students/instructors  are  encouraged 
to  install  and  compile  these  programs  on  their  local  hardware.  Great  pains  have 
been  taken  to  make  the  source  code  for  these  programs  readable  (see  Appendix  A). 
In  particular,  the  names  of  the  variables  in  the  programs  are  consistent  with  the 
notation  of  this  book. 
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There  are  two  ways  to  run  these  programs.  The  first  is  to  prepare  the  input  in 
a  standard  computer-file  format,  called  MPS  format,  and  to  run  the  program  using 
such  a  file  as  input.  The  advantage  of  this  input  format  is  that  there  is  an  archive 
of  problems  stored  in  this  format,  called  the  NETLIB  suite,  that  one  can  download 
and  use  immediately  (a  link  to  the  NETLIB  suite  can  be  found  at  the  web  site  men¬ 
tioned  below).  But,  this  format  is  somewhat  archaic  and,  in  particular,  it  is  not  easy 
to  create  these  files  by  hand.  Therefore,  the  programs  can  also  be  run  from  within  a 
problem  modeling  system  called  AMPL.  AMPL  allows  one  to  describe  mathemat¬ 
ical  programming  problems  using  an  easy  to  read,  yet  concise,  algebraic  notation. 
To  run  the  programs  within  AMPL,  one  simply  tells  AMPL  the  name  of  the  solver- 
program  before  asking  that  a  problem  be  solved.  The  text  that  describes  AMPL, 
Lourer  et  al.  (1993)  makes  an  excellent  companion  to  this  book.  It  includes  a  dis¬ 
cussion  of  many  practical  linear  programming  problems.  It  also  has  lots  of  exercises 
to  hone  the  modeling  skills  of  the  student. 

Several  interesting  computer  projects  can  be  suggested.  Here  are  a  few  sugges¬ 
tions  regarding  the  simplex  codes: 

•  Incorporate  the  partial  pricing  strategy  (see  Section  8.7)  into  the  two- 
phase  simplex  method  and  compare  it  with  full  pricing. 

•  Incorporate  the  steepest-edge  pivot  rule  (see  Section  8.8)  into  the  two- 
phase  simplex  method  and  compare  it  with  the  largest-coefficient  rule. 

•  Modify  the  code  for  either  variant  of  the  simplex  method  so  that  it  can 
treat  bounds  and  ranges  implicitly  (see  Chapter  9),  and  compare  the  per¬ 
formance  with  the  explicit  treatment  of  the  supplied  codes. 

•  Implement  a  “warm-start”  capability  so  that  the  sensitivity  analyses  dis¬ 
cussed  in  Chapter  7  can  be  done. 

•  Extend  the  simplex  codes  to  be  able  to  handle  integer  programming  prob¬ 
lems  using  the  branch-and-bound  method  described  in  Chapter  23. 

As  for  the  interior-point  codes,  one  could  try  some  of  the  following  projects: 

•  Modify  the  code  for  the  path-following  algorithm  so  that  it  implements 
the  affine-scaling  method  (see  Chapter  21),  and  then  compare  the  two 
methods. 

•  Modify  the  code  for  the  path-following  method  so  that  it  can  treat  bounds 
and  ranges  implicitly  (see  Section  20.3),  and  compare  the  performance 
against  the  explicit  treatment  in  the  given  code. 

•  Modify  the  code  for  the  path-following  method  to  implement  the  higher- 
order  method  described  in  Exercise  18.5.  Compare. 

•  Extend  the  path-following  code  to  solve  quadratic  programming  problems 
using  the  algorithm  shown  in  Figure  24.3. 

•  Further  extend  the  code  so  that  it  can  solve  convex  optimization  problems 
using  the  algorithm  shown  in  Figure  25.2. 

And,  perhaps  the  most  interesting  project  of  all: 

•  Compare  the  simplex  codes  against  the  interior-point  code  and  decide  for 
yourself  which  algorithm  is  better  on  specific  families  of  problems. 
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The  software  implementing  the  various  algorithms  was  developed  using  consistent 
data  structures  and  so  making  fair  comparisons  should  be  straightforward.  The  soft¬ 
ware  can  be  downloaded  from  the  following  web  site: 

http://www.princeton.edu/^rvdb/LPbook/ 

If,  in  the  future,  further  codes  relating  to  this  text  are  developed  (for  example,  a 
self-dual  network  simplex  code),  they  will  be  made  available  through  this  web  site. 

Features .  Here  are  some  other  features  that  distinguish  this  book  from  others: 

•  The  development  of  the  simplex  method  leads  to  Dantzig’s  parametric 
self-dual  method.  A  randomized  variant  of  this  method  is  shown  to  be 
immune  to  the  travails  of  degeneracy. 

•  The  book  gives  a  balanced  treatment  to  both  the  traditional  simplex  method 
and  the  newer  interior-point  methods.  The  notation  and  analysis  is  de¬ 
veloped  to  be  consistent  across  the  methods.  As  a  result,  the  self-dual 
simplex  method  emerges  as  the  variant  of  the  simplex  method  with  most 
connections  to  interior-point  methods. 

•  From  the  beginning  and  consistently  throughout  the  book,  linear  program¬ 
ming  problems  are  formulated  in  symmetric  form.  By  highlighting  sym¬ 
metry  throughout,  it  is  hoped  that  the  reader  will  more  fully  understand 
and  appreciate  duality  theory. 

•  By  slightly  changing  the  right-hand  side  in  the  Klee-Minty  problem,  we 
are  able  to  write  down  an  explicit  dictionary  for  each  vertex  of  the  Klee- 
Minty  problem  and  thereby  uncover  (as  a  homework  problem)  a  simple, 
elegant  argument  why  the  Klee-Minty  problem  requires  2n  —  1  pivots  to 
solve. 

•  The  chapter  on  regression  includes  an  analysis  of  the  expected  number 
of  pivots  required  by  the  self-dual  variant  of  the  simplex  method.  This 
analysis  is  supported  by  an  empirical  study. 

•  There  is  an  extensive  treatment  of  modern  interior-point  methods,  includ¬ 
ing  the  primal-dual  method,  the  affine-scaling  method,  and  the  self-dual 
path-following  method. 

•  In  addition  to  the  traditional  applications,  which  come  mostly  from  busi¬ 
ness  and  economics,  the  book  features  other  important  applications  such 
as  the  optimal  design  of  truss-like  structures  and  L1  -regression. 

Exercises  on  the  Web.  There  is  always  a  need  for  fresh  exercises.  Hence,  I  have 
created  and  plan  to  maintain  a  growing  archive  of  exercises  specifically  created  for 
use  in  conjunction  with  this  book.  This  archive  is  accessible  from  the  book’s  web 
site: 


http://www.princeton.edu/^rvdb/LPbook/ 

The  problems  in  the  archive  are  arranged  according  to  the  chapters  of  this  book  and 
use  notation  consistent  with  that  developed  herein. 

Advice  on  solving  the  exercises.  Some  problems  are  routine  while  others  are 
fairly  challenging.  Answers  to  some  of  the  problems  are  given  at  the  back  of  the  book. 
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In  general,  the  advice  given  to  me  by  Leonard  Gross  (when  I  was  a  student)  should 
help  even  on  the  hard  problems:  follow  your  nose. 

Audience.  This  book  evolved  from  lecture  notes  developed  for  my  introductory 
graduate  course  in  linear  programming  as  well  as  my  upper-level  undergraduate 
course.  A  reasonable  undergraduate  syllabus  would  cover  essentially  all  of  Part  1 
(Simplex  Method  and  Duality),  the  first  two  chapters  of  Part  2  (Network  Flows 
and  Applications),  and  the  first  chapter  of  Part  4  (Integer  Programming).  At  the 
graduate  level,  the  syllabus  should  depend  on  the  preparation  of  the  students.  For  a 
well-prepared  class,  one  could  cover  the  material  in  Parts  1  and  2  fairly  quickly  and 
then  spend  more  time  on  Parts  3  (Interior-Point  Methods)  and  4  (Extensions). 

Dependencies.  In  general,  Parts  2  and  3  are  completely  independent  of  each 
other.  Both  depend,  however,  on  the  material  in  Part  1.  The  first  Chapter  in  Part  4 
(Integer  Programming)  depends  only  on  material  from  Part  1,  whereas  the  remaining 
chapters  build  on  Part  3  material. 

Acknowledgments.  My  interest  in  linear  programming  was  sparked  by  Robert 
Garfinkel  when  we  shared  an  office  at  Bell  Labs.  I  would  like  to  thank  him  for 
his  constant  encouragement,  advice,  and  support.  This  book  benefited  greatly  from 
the  thoughtful  comments  and  suggestions  of  David  Bernstein  and  Michael  Todd.  I 
would  also  like  to  thank  the  following  colleagues  for  their  help:  Ronny  Ben-Tal, 
Leslie  Hall,  Yoshi  Ikura,  Victor  Klee,  Irvin  Lustig,  Avi  Mandelbaum,  Marc  Meke- 
ton,  Narcis  Nabona,  James  Orlin,  Andrzej  Ruszczynski,  and  Henry  Wolkowicz.  I 
would  like  to  thank  Gary  Folven  at  Kluwer  and  Fred  Hillier,  the  series  editor,  for 
encouraging  me  to  undertake  this  project.  I  would  like  to  thank  my  students  for 
finding  many  typos  and  occasionally  more  serious  errors:  John  Gilmartin,  Jacinta 
Warnie,  Stephen  Woolbert,  Lucia  Wu,  and  Bing  Yang.  My  thanks  to  Erhan  £inlar 
for  the  many  times  he  offered  advice  on  questions  of  style.  I  hope  this  book  re¬ 
flects  positively  on  his  advice.  Finally,  I  would  like  to  acknowledge  the  support  of 
the  National  Science  Foundation  and  the  Air  Force  Office  of  Scientific  Research 
for  supporting  me  while  writing  this  book.  In  a  time  of  declining  resources,  I  am 
especially  grateful  for  their  support. 
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For  the  2nd  edition,  many  new  exercises  have  been  added.  Also  I  have  worked 
hard  to  develop  online  tools  to  aid  in  learning  the  simplex  method  and  duality  theory. 
These  online  tools  can  be  found  on  the  book’s  web  page: 

http://www.princeton.edu/^rvdb/LPbook/ 

and  are  mentioned  at  appropriate  places  in  the  text.  Besides  the  learning  tools,  I  have 
created  several  online  exercises.  These  exercises  use  randomly  generated  problems 
and  therefore  represent  a  virtually  unlimited  collection  of  “routine”  exercises  that 
can  be  used  to  test  basic  understanding.  Pointers  to  these  online  exercises  are  in¬ 
cluded  in  the  exercises  sections  at  appropriate  points. 

Some  other  notable  changes  include: 

•  The  chapter  on  network  flows  has  been  completely  rewritten.  Hopefully, 
the  new  version  is  an  improvement  on  the  original. 

•  Two  different  fonts  are  now  used  to  distinguish  between  the  set  of  basic 
indices  and  the  basis  matrix. 

•  The  first  edition  placed  great  emphasis  on  the  symmetry  between  the  pri¬ 
mal  and  the  dual  (the  negative  transpose  property).  The  second  edition 
carries  this  further  with  a  discussion  of  the  relationship  between  the  basic 
and  nonbasic  matrices  B  and  N  as  they  appear  in  the  primal  and  in  the 
dual.  We  show  that,  even  though  these  matrices  differ  (they  even  have 
different  dimensions),  B~XN  in  the  dual  is  the  negative  transpose  of  the 
corresponding  matrix  in  the  primal. 

•  In  the  chapters  devoted  to  the  simplex  method  in  matrix  notation,  the  col¬ 
lection  of  variables  z\ ,  , . . . ,  zn ,  y\ ,  2/2 , . . . ,  ym  was  replaced,  in  the  first 

edition,  with  the  single  array  of  variables  2/1, 2/2?  •  •  •  >  2/n+m-  This  caused 
great  confusion  as  the  variable  2 h  in  the  original  notation  was  changed 
to  yn+i  in  the  new  notation.  For  the  second  edition,  I  have  changed  the 
notation  for  the  single  array  to  zi,  . . . ,  zn+rn. 

•  A  number  of  figures  have  been  added  to  the  chapters  on  convex  analysis 
and  on  network  flow  problems. 

•  The  algorithm  refered  to  as  the  primal-dual  simplex  method  in  the  first 
edition  has  been  renamed  the  parametric  self-dual  simplex  method  in  ac¬ 
cordance  with  prior  standard  usage. 
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•  The  last  chapter,  on  convex  optimization,  has  been  extended  with  a  dis¬ 
cussion  of  merit  functions  and  their  use  in  shortenning  steps  to  make  some 
otherwise  nonconvergent  problems  converge. 

Acknowledgments.  Many  readers  have  sent  corrections  and  suggestions  for 
improvement.  Many  of  the  corrections  were  incorporated  into  earlier  reprintings. 
Only  those  that  affected  pagination  were  accrued  to  this  new  edition.  Even  though 
I  cannot  now  remember  everyone  who  wrote,  I  am  grateful  to  them  all.  Some  sent 
comments  that  had  significant  impact.  They  were  Hande  Benson,  Eric  Denardo, 
Sudhakar  Mandapati,  Michael  Overton,  and  Jos  Sturm. 

Princeton,  NJ,  USA  Robert  J.  Vanderbei 
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It  has  been  almost  7  years  since  the  2nd  edition  appeared  and  the  publisher  is 
itching  for  me  to  finish  a  new  edition.  The  previous  edition  had  very  few  typos.  I 
have  fixed  them  all!  Of  course,  I’ve  also  added  some  new  material  and  who  knows 
how  many  new  typos  I’ve  introduced.  The  most  significant  new  material  is  con¬ 
tained  in  a  new  chapter  on  financial  applications,  which  discusses  a  linear  program¬ 
ming  variant  of  the  portfolio  selection  problem  and  option  pricing.  I  am  grateful  to 
Alex  d’  Aspremont  for  pointing  out  that  the  option  pricing  problem  provides  a  nice 
application  of  duality  theory.  Finally,  I’d  like  to  acknowledge  the  fact  that  half  (four 
out  of  eight)  of  the  typos  were  reported  to  me  by  Trond  Steihaug.  Thanks  Trond! 
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Besides  the  ongoing  tweaking  and  refining  of  the  language  and  presentation  of 
the  material,  this  edition  also  features  new  material  in  Chapters  4  and  12  on  the 
average  performance  of  the  simplex  method. 

I’d  like  to  thank  Cagin  Ararat  and  Firdevs  Ulus  for  carefully  reviewing  and 
commenting  on  this  new  material. 
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Part  1 

Basic  Theory:  The  Simplex  Method 

and  Duality 


We  all  love  to  instruct ,  though  we  can  teach  only 
what  is  not  worth  knowing.  —  J.  Austen 


CHAPTER  1 


Introduction 


This  book  is  mostly  about  a  subject  called  Linear  Programming.  Before  defining 
what  we  mean,  in  general,  by  a  linear  programming  problem,  let  us  describe  a  few 
practical  real-world  problems  that  serve  to  motivate  and  at  least  vaguely  to  define 
this  subject. 


1.  Managing  a  Production  Facility 

Consider  a  production  facility  for  a  manufacturing  company.  The  facility  is 
capable  of  producing  a  variety  of  products  that,  for  simplicity,  we  enumerate  as 
1,  2, . . . ,  n.  These  products  are  constructed/manufactured/produced  out  of  certain 
raw  materials.  Suppose  that  there  are  m  different  raw  materials,  which  again  we 
simply  enumerate  as  1,  2, . . . ,  m.  The  decisions  involved  in  managing/operating  this 
facility  are  complicated  and  arise  dynamically  as  market  conditions  evolve  around 
it.  However,  to  describe  a  simple,  fairly  realistic  optimization  problem,  we  consider 
a  particular  snapshot  of  the  dynamic  evolution.  At  this  specific  point  in  time,  the 
facility  has,  for  each  raw  material  i  =  1,  2, . . . ,  m,  a  known  amount,  say  bi ,  on 
hand.  Furthermore,  each  raw  material  has  at  this  moment  in  time  a  known  unit 
market  value.  We  denote  the  unit  value  of  the  it h  raw  material  by  pi. 

In  addition,  each  product  is  made  from  known  amounts  of  the  various  raw  ma¬ 
terials.  That  is,  producing  one  unit  of  product  j  requires  a  certain  known  amount, 
say  ctij  units,  of  raw  material  i.  Also,  the  jth  final  product  can  be  sold  at  the  known 
prevailing  market  price  of  oy  dollars  per  unit. 

Throughout  this  section  we  make  an  important  assumption: 

The  production  facility  is  small  relative  to  the  market  as  a  whole 
and  therefore  cannot  through  its  actions  alter  the  prevailing  mar¬ 
ket  value  of  its  raw  materials,  nor  can  it  affect  the  prevailing 
market  price  for  its  products. 

We  consider  two  optimization  problems  related  to  the  efficient  operation  of  this 
facility  (later,  in  Chapter  5,  we  will  see  that  these  two  problems  are  in  fact  closely 
related  to  each  other). 

1.1.  Production  Manager  as  Optimist.  The  first  problem  we  wish  to  consider 
is  the  one  faced  by  the  company’s  production  manager.  It  is  the  problem  of  how  to 
use  the  raw  materials  on  hand.  Let  us  assume  that  she  decides  to  produce  Xj  units 
of  the  jth  product,  j  =  1,  2, . . . ,  n.  The  revenue  associated  with  the  production  of 
one  unit  of  product  j  is  ay .  But  there  is  also  a  cost  of  raw  materials  that  must  be 
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considered.  The  cost  of  producing  one  unit  of  product  j  is  Y1T=  1  Piaij •  Therefore, 
the  net  revenue  associated  with  the  production  of  one  unit  is  the  difference  between 
the  revenue  and  the  cost.  Since  the  net  revenue  plays  an  important  role  in  our  model, 
we  introduce  notation  for  it  by  setting 

m 

Cj  CTj  ^  ^  Pi&ij  •>  j  1,  2,  .  .  .  ,  ft. 

i— 1 

Now,  the  net  revenue  corresponding  to  the  production  of  x3  units  of  product  j  is 
simply  CjXj ,  and  the  total  net  revenue  is 

n 

(1.1) 

3  = 1 

The  production  planner’s  goal  is  to  maximize  this  quantity.  However,  there  are  con¬ 
straints  on  the  production  levels  that  she  can  assign.  For  example,  each  production 
quantity  xj  must  be  nonnegative,  and  so  she  has  the  constraints 

(1.2)  Xj  >  0,  j  =  l,2,...,n. 

Secondly,  she  can’t  produce  more  product  than  she  has  raw  material  for.  The  amount 
of  raw  material  i  consumed  by  a  given  production  schedule  is  Y^=i  aijxj>  and  so 
she  must  adhere  to  the  following  constraints: 

n 

(1.3)  ^^dijXj  <  bi  i  =  1,  2, . . . , m. 

3  = 1 

To  summarize,  the  production  manager’s  job  is  to  determine  production  values  Xj, 
j  =  1,  2, . . . ,  n,  so  as  to  maximize  (1.1)  subject  to  the  constraints  given  by  (1.2)  and 

(1.3) .  This  optimization  problem  is  an  example  of  a  linear  programming  problem. 
This  particular  example  is  often  called  the  resource  allocation  problem . 

1.2.  Comptroller  as  Pessimist.  In  another  office  at  the  production  facility 
sits  an  executive  called  the  comptroller.  The  comptroller’s  problem  (among  others) 
is  to  assign  a  value  to  the  raw  materials  on  hand.  These  values  are  needed  for 
accounting  and  planning  purposes  to  determine  the  cost  of  inventory.  There  are 
rules  about  how  these  values  can  be  set.  The  most  important  such  rule  (and  the  only 
one  relevant  to  our  discussion)  is  the  following: 

The  company  must  be  willing  to  sell  the  raw  materials  should 
an  outside  firm  offer  to  buy  them  at  a  price  consistent  with  these 
values. 

Let  Wi  denote  the  assigned  unit  value  of  the  ith  raw  material,  i  =  1, 2, . . . ,  m. 
That  is,  these  are  the  numbers  that  the  comptroller  must  determine.  The  lost  oppor¬ 
tunity  cost  of  having  bi  units  of  raw  material  i  on  hand  is  b{Wi,  and  so  the  total  lost 
opportunity  cost  is 

m 

i— 1 


(1.4) 


2.  THE  LINEAR  PROGRAMMING  PROBLEM 


5 


The  comptroller’s  goal  is  to  minimize  this  lost  opportunity  cost  (to  make  the  finan¬ 
cial  statements  look  as  good  as  possible).  But  again,  there  are  constraints.  First  of 
all,  each  assigned  unit  value  wi  must  be  no  less  than  the  prevailing  unit  market  value 
pi ,  since  if  it  were  less  an  outsider  would  buy  the  company’s  raw  material  at  a  price 
lower  than  pi ,  contradicting  the  assumption  that  pi  is  the  prevailing  market  price. 
That  is, 


(1.5)  Wi  >  pi,  i  =  l,2, . . . ,  m. 

Similarly, 

m 

(1.6)  2_jWiClii  -  j  =  1, 2, ...  ,n. 

i— 1 

To  see  why,  suppose  that  the  opposite  inequality  holds  for  some  specific  product  j. 
Then  an  outsider  could  buy  raw  materials  from  the  company,  produce  product  j,  and 
sell  it  at  a  lower  price  than  the  prevailing  market  price.  This  contradicts  the  assump¬ 
tion  that  (jj  is  the  prevailing  market  price,  which  cannot  be  lowered  by  the  actions 
of  the  company  we  are  studying.  Minimizing  (1.4)  subject  to  the  constraints  given 
by  (1.5)  and  (1.6)  is  a  linear  programming  problem.  It  takes  on  a  slightly  simpler 
form  if  we  make  a  change  of  variables  by  letting 


Vi  —  Pii  i  —  lj  2,  .  .  .  ,  772. 


In  words,  i ji  is  the  increase  in  the  unit  value  of  raw  material  i  representing  the  “mark¬ 
up”  the  company  would  charge  should  it  wish  simply  to  act  as  a  reseller  and  sell  raw 
materials  back  to  the  market.  In  terms  of  these  variables,  the  comptroller’s  problem 
is  to  minimize 

m 

i—  1 


subject  to 

m 

^  ^  Vi^ij  —  ^ j  ?  j  =  1,  2,  .  .  .  ,  77 

i— 1 


and 


Hi  >  0,  i  =  1,  2,  .  .  .  ,  777. 

Note  that  we’ve  dropped  a  term,  hi  pi,  from  the  objective.  It  is  a  constant  (the 

market  value  of  the  raw  materials),  and  so,  while  it  affects  the  value  of  the  function 
being  minimized,  it  does  not  have  any  impact  on  the  actual  optimal  values  of  the 
variables  (whose  determination  is  the  comptroller’s  main  interest). 


2.  The  Linear  Programming  Problem 

In  the  two  examples  given  above,  there  have  been  variables  whose  values  are 
to  be  decided  in  some  optimal  fashion.  These  variables  are  referred  to  as  decision 
variables.  They  are  usually  denoted  as 

Xj,  j  =  1,  2, . . . ,  77. 
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In  linear  programming,  the  objective  is  always  to  maximize  or  to  minimize  some 
linear  function  of  these  decision  variables 


C  =  C\X\  +  c2x2  H - b  Cnxn 


This  function  is  called  the  objective  function.  It  often  seems  that  real-world  prob¬ 
lems  are  most  naturally  formulated  as  minimizations  (since  real-world  planners  al¬ 
ways  seem  to  be  pessimists),  but  when  discussing  mathematics  it  is  usually  nicer  to 
work  with  maximization  problems.  Of  course,  converting  from  one  to  the  other  is 
trivial  both  from  the  modeler’s  viewpoint  (either  minimize  cost  or  maximize  profit) 
and  from  the  analyst’s  viewpoint  (either  maximize  (  or  minimize  — Q.  Since  this 
book  is  primarily  about  the  mathematics  of  linear  programming,  we  usually  take  the 
optimist’s  view  of  maximizing  the  objective  function. 

In  addition  to  the  objective  function,  the  examples  also  had  constraints.  Some 
of  these  constraints  were  really  simple,  such  as  the  requirement  that  some  decision 
variable  be  nonnegative.  Others  were  more  involved.  But  in  all  cases  the  constraints 
consisted  of  either  an  equality  or  an  inequality  associated  with  some  linear  combi¬ 
nation  of  the  decision  variables: 


CL\X\  -j-  a2X2  +  *  *  *  +  CLnXn 


< 

> 


b. 


It  is  easy  to  convert  constraints  from  one  form  to  another.  For  example,  an 
inequality  constraint 

cl \X\  H-  a2x2  +  •  •  •  +  cinxn  <  b 

can  be  converted  to  an  equality  constraint  by  adding  a  nonnegative  variable,  w, 
which  we  call  a  slack  variable : 


a\X\  +  a2x2  +  •  •  •  +  anxn  +  w  =  b,  w  >  0. 

On  the  other  hand,  an  equality  constraint 

cl\X\  -j -  a2x2  +  •  •  •  +  CLnxn  =  b 

can  be  converted  to  inequality  form  by  introducing  two  inequality  constraints: 

a\X\  -I-  a2x2  H - b  anxn  <  b 

Qj\X\  a2x2  +  •  •  •  +  CLnxn  >  b. 

Hence,  in  some  sense,  there  is  no  a  priori  preference  for  how  one  poses  the  con¬ 
straints  (as  long  as  they  are  linear,  of  course).  However,  we  shall  also  see  that,  from 
a  mathematical  point  of  view,  there  is  a  preferred  presentation.  It  is  to  pose  the 
inequalities  as  less-thans  and  to  stipulate  that  all  the  decision  variables  be  nonnega¬ 
tive.  Hence,  the  linear  programming  problem,  as  we  study  it,  can  be  formulated  as 
follows: 

maximize  c\X\+  c2x2-\ —  •  +  cnxn 

subject  to  anxi  +  a\2x2  H —  •  +  ainxn  <  b\ 

CL2\Xi  +  a22x2  +  •  •  •  +  ci2nxn  <  b2 

CLml'Kl  “b  CLrn2X2  H - -  +  brn 

X\,  X2)  .  .  .  Xn  >  0. 
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We  refer  to  linear  programs  formulated  this  way  as  linear  programs  in  standard 
form.  In  our  aim  for  consistency,  we  shall  always  use  m  to  denote  the  number  of 
constraints,  and  n  to  denote  the  number  of  decision  variables. 

A  proposal  of  specific  values  for  the  decision  variables  is  called  a  solution.  A 
solution  [x\,  X2, . . . ,  xn)  is  called  feasible  if  it  satisfies  all  of  the  constraints.  It  is 
called  optimal  if  in  addition  it  attains  the  desired  maximum.  Some  problems  are  just 
simply  infeasible,  as  the  following  example  illustrates: 

maximize  5xi  +  4^2 
subject  to  xi  +  X2  <  2 

— 2xi  —  2x2  <  —9 
X1,  X2  >  0. 

Indeed,  the  second  constraint  implies  that  x\  +  X2  >  4.5,  which  contradicts  the  first 
constraint.  If  a  problem  has  no  feasible  solution,  then  the  problem  itself  is  called 
infeasible. 

At  the  other  extreme  from  infeasible  problems,  one  finds  unbounded  problems. 
A  problem  is  unbounded  if  it  has  feasible  solutions  with  arbitrarily  large  objective 
values.  For  example,  consider 

maximize  x\  —  4x2 
subject  to  —  2xi  +  X2  <  —1 
—x\  —  2x2  <  —2 
Xi,  X2  >  0. 

Here,  we  could  set  X2  to  zero  and  let  x\  be  arbitrarily  large.  As  long  as  x\  is  greater 
than  2  the  solution  will  be  feasible,  and  as  it  gets  large  the  objective  function  does 
too.  Hence,  the  problem  is  unbounded.  In  addition  to  finding  optimal  solutions 
to  linear  programming  problems,  we  shall  also  be  interested  in  detecting  when  a 
problem  is  infeasible  or  unbounded. 


Exercises 

1.1  A  steel  company  must  decide  how  to  allocate  next  week’s  time  on  a  rolling 
mill,  which  is  a  machine  that  takes  unfinished  slabs  of  steel  as  input  and 
can  produce  either  of  two  semi-finished  products:  bands  and  coils.  The 
mill’s  two  products  come  off  the  rolling  line  at  different  rates: 

Bands  200  tons/h 
Coils  140  tons/h. 

They  also  produce  different  profits: 

Bands  $25/ton 
Coils  $30/ton. 

Based  on  currently  booked  orders,  the  following  upper  bounds  are  placed 
on  the  amount  of  each  product  to  produce: 

Bands  6,000  tons 
Coils  4,000  tons. 
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Given  that  there  are  40  h  of  production  time  available  this  week,  the  prob¬ 
lem  is  to  decide  how  many  tons  of  bands  and  how  many  tons  of  coils 
should  be  produced  to  yield  the  greatest  profit.  Formulate  this  problem 
as  a  linear  programming  problem.  Can  you  solve  this  problem  by  inspec¬ 
tion? 

1.2  A  small  airline,  Ivy  Air,  flies  between  three  cities:  Ithaca,  Newark,  and 
Boston.  They  offer  several  flights  but,  for  this  problem,  let  us  focus  on 
the  Friday  afternoon  flight  that  departs  from  Ithaca,  stops  in  Newark,  and 
continues  to  Boston.  There  are  three  types  of  passengers: 

(a)  Those  traveling  from  Ithaca  to  Newark. 

(b)  Those  traveling  from  Newark  to  Boston. 

(c)  Those  traveling  from  Ithaca  to  Boston. 

The  aircraft  is  a  small  commuter  plane  that  seats  30  passengers.  The  air¬ 
line  offers  three  fare  classes: 

(a)  Y  class:  full  coach. 

(b)  B  class:  nonrefundable. 

(c)  M  class:  nonrefundable,  3- week  advanced  purchase. 

Ticket  prices,  which  are  largely  determined  by  external  influences  (i.e., 
competitors),  have  been  set  and  advertised  as  follows: 


Ithaca-Newark 

Newark-Boston 

Ithaca-Boston 

Y 

300 

160 

360 

B 

220 

130 

280 

M 

i—1 

o 

o 

80 

140 

Based  on  past  experience,  demand  forecasters  at  Ivy  Air  have  determined 
the  following  upper  bounds  on  the  number  of  potential  customers  in  each 
of  the  nine  possible  origin-destination/fare-class  combinations: 


Ithaca-Newark 

Newark-Boston 

Ithaca-Boston 

Y 

4 

8 

3 

B 

8 

13 

10 

M 

22 

20 

18 

The  goal  is  to  decide  how  many  tickets  from  each  of  the  nine  origin/ 
destination/fare-class  combinations  to  sell.  The  constraints  are  that  the 
plane  cannot  be  overbooked  on  either  of  the  two  legs  of  the  flight  and  that 
the  number  of  tickets  made  available  cannot  exceed  the  forecasted  maxi¬ 
mum  demand.  The  objective  is  to  maximize  the  revenue.  Formulate  this 
problem  as  a  linear  programming  problem. 

1.3  Suppose  that  Y  is  a  random  variable  taking  on  one  of  n  known  values: 

^1  5  ^2  5  *  *  *  5  ^Ti  * 

Suppose  we  know  that  Y  either  has  distribution  p  given  by 

P<T  =  a  j)=  Pj 


NOTES 
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or  it  has  distribution  q  given  by 

=  Oj)  =  Qj- 

Of  course,  the  numbers  pj,  j  =  1,  2, . . . ,  n  are  nonnegative  and  sum  to 
one.  The  same  is  true  for  the  s.  Based  on  a  single  observation  of  Y,  we 
wish  to  guess  whether  it  has  distribution  p  or  distribution  q.  That  is,  for 
each  possible  outcome  dj ,  we  will  assert  with  probability  Xj  that  the  dis¬ 
tribution  is  p  and  with  probability  1  —  Xj  that  the  distribution  is  q.  We  wish 
to  determine  the  probabilities  Xj ,  j  =  1,2 , . . . ,  n,  such  that  the  probability 
of  saying  the  distribution  is  p  when  in  fact  it  is  q  has  probability  no  larger 
than  /?,  where  f3  is  some  small  positive  value  (such  as  0.05).  Furthermore, 
given  this  constraint,  we  wish  to  maximize  the  probability  that  we  say  the 
distribution  is  p  when  in  fact  it  is  p.  Formulate  this  maximization  problem 
as  a  linear  programming  problem. 

Notes 

The  subject  of  linear  programming  has  its  roots  in  the  study  of  linear  inequal¬ 
ities,  which  can  be  traced  as  far  back  as  1826  to  the  work  of  Fourier.  Since  then, 
many  mathematicians  have  proved  special  cases  of  the  most  important  result  in  the 
subject — the  duality  theorem.  The  applied  side  of  the  subject  got  its  start  in  1939 
when  L.V.  Kantorovich  noted  the  practical  importance  of  a  certain  class  of  linear 
programming  problems  and  gave  an  algorithm  for  their  solution — see  Kantorovich 
(1960).  Unfortunately,  for  several  years,  Kantorovich’s  work  was  unknown  in  the 
West  and  unnoticed  in  the  East.  The  subject  really  took  off  in  1947  when  G.B. 
Dantzig  invented  the  simplex  method  for  solving  the  linear  programming  problems 
that  arose  in  U.S.  Air  Force  planning  problems.  The  earliest  published  accounts 
of  Dantzig’s  work  appeared  in  1951  (Dantzig  1951a,b).  His  monograph  (Dantzig 
1963)  remains  an  important  reference.  In  the  same  year  that  Dantzig  invented  the 
simplex  method,  T.C.  Koopmans  showed  that  linear  programming  provided  the  ap¬ 
propriate  model  for  the  analysis  of  classical  economic  theories.  In  1975,  the  Royal 
Swedish  Academy  of  Sciences  awarded  the  Nobel  Prize  in  economic  science  to 
L.V.  Kantorovich  and  T.C.  Koopmans  “for  their  contributions  to  the  theory  of  opti¬ 
mum  allocation  of  resources.”  Apparently  the  academy  regarded  Dantzig’s  work 
as  too  mathematical  for  the  prize  in  economics  (and  there  is  no  Nobel  Prize  in 
mathematics). 

The  textbooks  by  Bradley  et  al.  (1977),  Bazaraa  et  al.  (1977),  and  Hillier  and 
Lieberman  (1977)  are  known  for  their  extensive  collections  of  interesting  practical 
applications. 


CHAPTER  2 


The  Simplex  Method 


In  this  chapter  we  present  the  simplex  method  as  it  applies  to  linear  program¬ 
ming  problems  in  standard  form. 


1.  An  Example 

We  first  illustrate  how  the  simplex  method  works  on  a  specific  example: 


maximize 

5xi 

+ 

4x2 

+ 

3x3 

subject  to 

2xi 

+ 

3x2 

+ 

x3 

< 

5 

4xi 

+ 

x2 

+ 

2x3 

< 

11 

3xi 

+ 

4x2 

+ 

2x3 

< 

8 

xi,  « 

,  x3 

> 

0 

We  start  by  adding  so-called  slack  variables.  For  each  of  the  less-than  inequalities 
in  (2.1)  we  introduce  a  new  variable  that  represents  the  difference  between  the  right- 
hand  side  and  the  left-hand  side.  For  example,  for  the  first  inequality, 

2x^  T  3^2  x3  ^  5, 

we  introduce  the  slack  variable  wi  defined  by 

w\  =  5  —  2x‘i  —  3x2  —  X3. 

It  is  clear  then  that  this  definition  of  w  1,  together  with  the  stipulation  that  w\  be 
nonnegative,  is  equivalent  to  the  original  constraint.  We  carry  out  this  procedure  for 
each  of  the  less-than  constraints  to  get  an  equivalent  representation  of  the  problem: 

maximize  (  =  5xi  +  4x2  +  3x3 

subject  to  w\  =  5  —  2xi  —  3x2  —  x3 

(2.2)  W2  =  11  —  4xi  —  X2  —  2x3 

W3  =  8  —  3xi  —  4x2  —  2x3 

Xi,  X2,  x3,  Wi,  W3  >  0. 

Note  that  we  have  included  a  notation,  £,  for  the  value  of  the  objective  function, 

5xi  +  4x2  +  3x3. 

The  simplex  method  is  an  iterative  process  in  which  we  start  with  a  solution 
xi,  X2,  • . . ,  W3  that  satisfies  the  equations  and  nonnegativities  in  (2.2)  and  then  look 
for  a  new  solution  xi,  X2 , . . . ,  fi)3,  which  is  better  in  the  sense  that  it  has  a  larger 
objective  function  value: 

5xi  +  4x2  +  3x3  >  5xi  +  4x2  +  3x3. 
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We  continue  this  process  until  we  arrive  at  a  solution  that  can’t  be  improved.  This 
final  solution  is  then  an  optimal  solution. 

To  start  the  iterative  process,  we  need  an  initial  feasible  solution  x\ ,  X2,  •  •  • ,  w3. 
For  our  example,  this  is  easy.  We  simply  set  all  the  original  variables  to  zero  and 
use  the  defining  equations  to  determine  the  slack  variables: 

x\  =0,  X2  =  0,  X3  =  0,  wi  =  5,  W2  =  11,  W3  =  8. 


The  objective  function  value  associated  with  this  solution  is  (  =  0. 

We  now  ask  whether  this  solution  can  be  improved.  Since  the  coefficient  of 
x\  in  the  objective  function  is  positive,  if  we  increase  the  value  of  x\  from  zero  to 
some  positive  value,  we  will  increase  (.  But  as  we  change  its  value,  the  values  of 
the  slack  variables  will  also  change.  We  must  make  sure  that  we  don’t  let  any  of 
them  go  negative.  Since  X2  and  X3  are  currently  set  to  0,  we  see  that  w\  =  5  —  2xi, 
and  so  keeping  w\  nonnegative  imposes  the  restriction  that  x\  must  not  exceed 
5/2.  Similarly,  the  nonnegativity  of  W2  imposes  the  bound  that  x\  <  11/4,  and 
the  nonnegativity  of  w3  introduces  the  bound  that  x\  <  8/3.  Since  all  of  these 
conditions  must  be  met,  we  see  that  x\  cannot  be  made  larger  than  the  smallest  of 
these  bounds:  x\  <  5/2.  Our  new,  improved  solution  then  is 


Xi  =  £2  =  0,  ^3 


1 

0,  wi  =0,  w2  =  1,  w3  =  -. 


This  first  step  was  straightforward.  It  is  less  obvious  how  to  proceed.  What 
made  the  first  step  easy  was  the  fact  that  we  had  one  group  of  variables  that  were 
initially  zero  and  we  had  the  rest  explicitly  expressed  in  terms  of  these.  This  prop¬ 
erty  can  be  arranged  even  for  our  new  solution.  Indeed,  we  simply  must  rewrite  the 
equations  in  (2.2)  in  such  a  way  that  xi,W2,w3,  and  (  are  expressed  as  functions  of 
wi,X2,  and  x3.  That  is,  the  roles  of  x\  and  w  1  must  be  swapped.  To  this  end,  we 
use  the  equation  for  w\  in  (2.2)  to  solve  for  x\\ 


X\ 


5 

2 


1 

2 


£3. 


The  equations  for  W2,w3,  and  (  must  also  be  doctored  so  that  x\  does  not  appear 
on  the  right.  The  easiest  way  to  accomplish  this  is  to  do  so-called  row  operations  on 
the  equations  in  (2.2).  For  example,  if  we  take  the  equation  for  W2  and  subtract  two 
times  the  equation  for  w  1  and  then  bring  the  w  1  term  to  the  right-hand  side,  we  get 


W2  =  1  +  2wi  +  5X2- 


Performing  analogous  row  operations  for  w3  and  we  can  rewrite  the  equations 
in  (2.2)  as 


C  =  12.5  —  2.5wi  —  3.5x2  +  0.5x3 
x\  =  2.5  —  0.5iei  —  1.5x2  —  0.5x3 

W2=  1  +  2'UJi  +  5X2 

w3  =  0.5  +  1.5uq  +  0.5x2  —  0.5X3. 


(2.3) 
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Note  that  we  can  recover  our  current  solution  by  setting  the  “independent”  variables 
to  zero  and  using  the  equations  to  read  off  the  values  for  the  “dependent”  variables. 

Now  we  see  that  increasing  w\  ov  x  2  will  bring  about  a  decrease  in  the  ob¬ 
jective  function  value,  and  so  xs,  being  the  only  variable  with  a  positive  coef¬ 
ficient,  is  the  only  independent  variable  that  we  can  increase  to  obtain  a  further 
increase  in  the  objective  function.  Again,  we  need  to  determine  how  much  this 
variable  can  be  increased  without  violating  the  requirement  that  all  the  dependent 
variables  remain  nonnegative.  This  time  we  see  that  the  equation  for  W2  is  not  af¬ 
fected  by  changes  in  x3,  but  the  equations  for  x\  and  ws  do  impose  bounds,  namely 
xs  <  5  and  xs  <  1,  respectively.  The  latter  is  the  tighter  bound,  and  so  the  new 
solution  is 


x\  =  2,  X2  =  0,  xs  =  1,  w  1  =  0,  W2  =  1,  ws  =  0. 

The  corresponding  objective  function  value  is  (  =  13. 

Once  again,  we  must  determine  whether  it  is  possible  to  increase  the  objective 
function  further  and,  if  so,  how.  Therefore,  we  need  to  write  our  equations  with 
C  #1,  W2,  and  xs  written  as  functions  of  uq,  X2,  and  ws.  Solving  the  last  equation 
in  (2.3)  for  £3,  we  get 

xs  =  1  +  3wi  +  X2  —  2  ws- 

Also,  performing  the  appropriate  row  operations,  we  can  eliminate  xs  from  the  other 
equations.  The  result  of  these  operations  is 

C  =  13  —  w\  —  3x2  —  ws 
Xi=  2-  2wi  -  2x2  +  Ws 
1  '  }  w2  =  1  +  2w!  +  5x2 

xs  =  1  +  3uq  +  X2~2ws> 

We  are  now  ready  to  begin  the  third  iteration.  The  first  step  is  to  identify  an 
independent  variable  for  which  an  increase  in  its  value  would  produce  a  correspond¬ 
ing  increase  in  (.  But  this  time  there  is  no  such  variable,  since  all  the  variables  have 
negative  coefficients  in  the  expression  for  (.  This  fact  not  only  brings  the  simplex 
method  to  a  standstill  but  also  proves  that  the  current  solution  is  optimal.  The  reason 
is  quite  simple.  Since  the  equations  in  (2.4)  are  completely  equivalent  to  those  in 
(2.2)  and,  since  all  the  variables  must  be  nonnegative,  it  follows  that  (  <  13  for 
every  feasible  solution.  Since  our  current  solution  attains  the  value  of  13,  we  see 
that  it  is  indeed  optimal. 

1.1.  Dictionaries,  Bases,  Etc.  The  systems  of  equations  (2.2),  (2.3),  and  (2.4) 
that  we  have  encountered  along  the  way  are  called  dictionaries.  With  the  excep¬ 
tion  of  £,  the  variables  that  appear  on  the  left  (i.e.,  the  variables  that  we  have  been 
referring  to  as  the  dependent  variables)  are  called  basic  variables.  Those  on  the 
right  (i.e.,  the  independent  variables)  are  called  nonbasic  variables.  The  solutions 
we  have  obtained  by  setting  the  nonbasic  variables  to  zero  are  called  basic  feasible 
solutions. 
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2.  The  Simplex  Method 


Consider  the  general  linear  programming  problem  presented  in  standard  form: 

n 

maximize  E  CjXj 

3= 1 
n 

subject  to  dijXj  <bi  i  =  1,  2, . . . ,  m 

3  = 1 

Tj  >  0  j  =  1,  2, . . . ,  n. 

Our  first  task  is  to  introduce  slack  variables  and  a  name  for  the  objective  function 
value: 

n 

cjxj 

3  = 1 
n 

i  =  1,  2, . . . ,  m. 

i=i 

As  we  saw  in  our  example,  as  the  simplex  method  proceeds,  the  slack  variables  be¬ 
come  intertwined  with  the  original  variables,  and  the  whole  collection  is  treated  the 
same.  Therefore,  it  is  at  times  convenient  to  have  a  notation  in  which  the  slack  vari¬ 
ables  are  more  or  less  indistinguishable  from  the  original  variables.  So  we  simply 
add  them  to  the  end  of  the  list  of  x- variables: 


(2.5) 


c 


_  h 


Oi, 


W 


m 


)  =  Oi, 


*^n  ?  *^n+l 


•>  ^n+m) 1 


That  is,  we  let  xn+i  =  W{.  With  this  notation,  we  can  rewrite  (2.5)  as 


n 


c=  E 


3  = 1 
n 


^n+i  hi  y  ^ 

i=i 


Ci 


i  =  1,  2, . . . ,  m. 


This  is  the  starting  dictionary.  As  the  simplex  method  progresses,  it  moves  from  one 
dictionary  to  another  in  its  search  for  an  optimal  solution.  Each  dictionary  has  m 
basic  variables  and  n  nonbasic  variables.  Let  B  denote  the  collection  of  indices  from 
{1,2,. +  corresponding  to  the  basic  variables,  and  let  A f  denote  the  indices 
corresponding  to  the  nonbasic  variables.  Initially,  we  have  A f  =  {1,  2, . . . ,  n}  and 
B  =  {n  +  1,  n  +  2, . . . ,  n  +  m},  but  this  of  course  changes  after  the  first  iteration. 
Down  the  road,  the  current  dictionary  will  look  like  this: 


C  =  C  +  E  cixi 

(2.6)  _ 

Xi  =  bi  —  y  aijxj  i  £  B. 

jeAf 

Note  that  we  have  put  bars  over  the  coefficients  to  indicate  that  they  change  as  the 
algorithm  progresses. 
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Within  each  iteration  of  the  simplex  method,  exactly  one  variable  goes  from 
nonbasic  to  basic  and  exactly  one  variable  goes  from  basic  to  nonbasic.  We  saw  this 
process  in  our  example,  but  let  us  now  describe  it  in  general. 

The  variable  that  goes  from  nonbasic  to  basic  is  called  the  entering  variable.  It 
is  chosen  with  the  aim  of  increasing  that  is,  one  whose  coefficient  is  positive:  pick 
k  from  {j  e  Af  :  Cj  >0}.  Note  that  if  this  set  is  empty,  then  the  current  solution 
is  optimal.  If  the  set  consists  of  more  than  one  element  (as  is  normally  the  case), 
then  we  have  a  choice  of  which  element  to  pick.  There  are  several  possible  selection 
criteria,  some  of  which  will  be  discussed  in  the  next  chapter.  For  now,  suffice  it  to 
say  that  we  usually  pick  an  index  k  having  the  largest  coefficient  (which  again  could 
leave  us  with  a  choice). 

The  variable  that  goes  from  basic  to  nonbasic  is  called  the  leaving  variable.  It 
is  chosen  to  preserve  nonnegativity  of  the  current  basic  variables.  Once  we  have 
decided  that  x k  will  be  the  entering  variable,  its  value  will  be  increased  from  zero 
to  a  positive  value.  This  increase  will  change  the  values  of  the  basic  variables: 

Xi  —  bi  CLikxkl  i  G  13. 

We  must  ensure  that  each  of  these  variables  remains  nonnegative.  Hence,  we  require 
that 


(2.7)  bi  —  aikxk  >0,  i  G  8. 

Of  these  expressions,  the  only  ones  that  can  go  negative  as  increases  are  those 
for  which  aik  is  positive;  the  rest  remain  fixed  or  increase.  Hence,  we  can  restrict 
our  attention  to  those  i’s  for  which  aik  is  positive.  And  for  such  an  i,  the  value  of 
xk  at  which  the  expression  becomes  zero  is 

xk  bi  j CLik . 

Since  we  don’t  want  any  of  these  to  go  negative,  we  must  raise  xk  only  to  the 
smallest  of  all  of  these  values: 


xk 


min  bi/aik. 

ieB:ciik>0 


Therefore,  with  a  certain  amount  of  latitude  remaining,  the  rule  for  selecting  the 
leaving  variable  is  pick  l  from  {i  G  8  :  dik  >  0  and  bi/dik  is  minimal}. 

The  rule  just  given  for  selecting  a  leaving  variable  describes  exactly  the  process 
by  which  we  use  the  rule  in  practice.  That  is,  we  look  only  at  those  variables  for 
which  aik  is  positive  and  among  those  we  select  one  with  the  smallest  value  of  the 
ratio  bi/dik.  There  is,  however,  another,  entirely  equivalent,  way  to  write  this  rule 
which  we  will  often  use.  To  derive  this  alternate  expression  we  use  the  convention 
that  0/0  =  0  and  rewrite  inequalities  (2.7)  as 


1  ^  ttik 
xk  bi 


i  G  8 


(we  shall  discuss  shortly  what  happens  when  one  of  these  ratios  is  an  indeterminate 
form  0/0  as  well  as  what  it  means  if  none  of  the  ratios  are  positive).  Since  we  wish 
to  take  the  largest  possible  increase  in  we  see  that 
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&ik 

max 
/■  b  bi 

Hence,  the  rule  for  selecting  the  leaving  variable  is  as  follows:  pick  l  from  {i  G  B  : 
aik /bi  is  maximal}. 

The  main  difference  between  these  two  ways  of  writing  the  rule  is  that  in  one 
we  minimize  the  ratio  of  to  bi  whereas  in  the  other  we  maximize  the  reciprocal 
ratio.  Of  course,  in  the  minimize  formulation  one  must  take  care  about  the  sign 
of  the  aik  s.  In  the  remainder  of  this  book  we  will  encounter  these  types  of  ratios 
often.  We  will  always  write  them  in  the  maximize  form  since  that  is  shorter  to  write, 
acknowledging  full  well  the  fact  that  it  is  often  more  convenient,  in  practice,  to  do 
it  the  other  way. 

Once  the  leaving-basic  and  entering-nonbasic  variables  have  been  selected,  the 
move  from  the  current  dictionary  to  the  new  dictionary  involves  appropriate  row 
operations  to  achieve  the  interchange.  This  step  from  one  dictionary  to  the  next  is 
called  a  pivot. 

As  mentioned  above,  there  is  often  more  than  one  choice  for  the  entering  and 
the  leaving  variables.  Particular  rules  that  make  the  choice  unambiguous  are  called 
pivot  rules. 


3.  Initialization 

In  the  previous  section,  we  presented  the  simplex  method.  However,  we  only 
considered  problems  for  which  the  right-hand  sides  were  all  nonnegative.  This 
ensured  that  the  initial  dictionary  was  feasible.  In  this  section,  we  discuss  what 
to  do  when  this  is  not  the  case. 

Given  a  linear  programming  problem 

n 

maximize  E  CjXj 

3= 1 
n 

subject  to  aijxj  E  bi  i  =  1,  2, . . . ,  m 

3  = 1 

Xj  >  0  j  =  1,  2, . . . ,  n, 

the  initial  dictionary  that  we  introduced  in  the  preceding  section  was 

n 

c=  Ec^i 

3  = 1 
n 

Wi  =  bi  —  aijXj  i  —  1,  2, . . . ,  m. 

3  = 1 

The  solution  associated  with  this  dictionary  is  obtained  by  setting  each  Xj  to  zero 
and  setting  each  Wi  equal  to  the  corresponding  bi.  This  solution  is  feasible  if  and 
only  if  all  the  right-hand  sides  are  nonnegative.  But  what  if  they  are  not?  We  handle 
this  difficulty  by  introducing  an  auxiliary  problem  for  which 
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(1)  A  feasible  dictionary  is  easy  to  find  and 

(2)  The  optimal  dictionary  provides  a  feasible  dictionary  for  the  original  prob¬ 
lem. 

The  auxiliary  problem  is 


maximize 

-Xo 

Tl 

subject  to 

aij  Xj 

-  x0<bi 

i  =  1,  2, . . . ,  m 

3  = 1 

Xj  >  0 

j  =  0, 

It  is  easy  to  give  a  feasible  solution  to  this  auxiliary  problem.  Indeed,  we  simply 
set  Xj  =  0,  for  j  =  1, . . . ,  n,  and  then  pick  xo  sufficiently  large.  It  is  also  easy 
to  see  that  the  original  problem  has  a  feasible  solution  if  and  only  if  the  auxiliary 
problem  has  a  feasible  solution  with  xo  =  0.  In  other  words,  the  original  problem 
has  a  feasible  solution  if  and  only  if  the  optimal  solution  to  the  auxiliary  problem 
has  objective  value  zero. 

Even  though  the  auxiliary  problem  clearly  has  feasible  solutions,  we  have  not 
yet  shown  that  it  has  an  easily  obtained  feasible  dictionary.  It  is  best  to  illustrate 
how  to  obtain  a  feasible  dictionary  with  an  example: 


maximize 

— 2x*i  — 

x2 

subject  to 

— Xi  + 

< 

-1 

—  Xi  — 

2x2 

< 

-2 

x2 

< 

1 

Xl; 

,  ^2 

> 

0 

The  auxiliary  problem  is 

maximize  —  xo 

subject  to  —  x\  +  X2  —  xo  <  —1 
—x\  —  2x2  —  xo  <  —2 
X2  -  Xo  <  1 

Xo,  #1,  X2  >  0. 

Next  we  introduce  slack  variables  and  write  down  an  initial  infeasible  dictionary : 

g= _ —  Xp 

Wi  =  —  1  +  Xi  —  X2  Xo 
W2  — 2  -T  Xi  -f-  2x2  H~  Xo 
w3=  1  -  x2-\~xo . 

This  dictionary  is  infeasible,  but  it  is  easy  to  convert  it  into  a  feasible  dictionary.  In 
fact,  all  we  need  to  do  is  one  pivot  with  variable  xo  entering  and  the  “most  infeasible 
variable,”  W2,  leaving  the  basis: 

£  ~2  +  X\  +  2^2  —  'X2 

Wi  =  1  —  3X2  +  W2 

xo  =  2  —  xi  —  2x2  +  re  2 

Wo  =  3  —  Xi  —  3x2  +  W2- 
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Note  that  we  now  have  a  feasible  dictionary,  so  we  can  apply  the  simplex  method  as 
defined  earlier  in  this  chapter.  For  the  first  step,  we  pick  X2  to  enter  and  w i  to  leave 
the  basis: 

£  =  —1.33  +  x  i  —  0.67iei  —  0.33w2 
X2  =  0.33  —  0.33iei  +  0.33^2 

xq  =  1.33  —  x\  +  0.67iei  +  0.33W2 

ws  =  2  —  x\  +  w\. 

Now,  for  the  second  step,  we  pick  x\  to  enter  and  x0  to  leave  the  basis: 

£  =  0  ~  Xp _ 

£2  =0.33  —  0.33uq  +  0.33w2 

x\  =  1.33  —  £o  +  0.67iei  +  0.33iE2 
ws  =  0.67  +  xq  +  0.33rEi  —  0.33iE2. 

This  dictionary  is  optimal  for  the  auxiliary  problem.  We  now  drop  xq  from  the 
equations  and  reintroduce  the  original  objective  function: 

(  =  —2xi  —  X2  =  —■ 3  —  Wi  —  W2- 

Hence,  the  starting  feasible  dictionary  for  the  original  problem  is 

C  =  —  3  —  W\  —  W2 
X2  =  0.33  —  0.33wi  +  0.33u;2 
x\  =  1.33  +  0.67iei  +  0.33u;2 
ws  =  0.67  +  0.33iei  —  0.33u;2. 

As  it  turns  out,  this  dictionary  is  optimal  for  the  original  problem  (since  the  coef¬ 
ficients  of  all  the  variables  in  the  equation  for  (  are  negative),  but  we  can’t  expect 
to  be  this  lucky  in  general.  All  we  normally  can  expect  is  that  the  dictionary  so  ob¬ 
tained  will  be  feasible  for  the  original  problem,  at  which  point  we  continue  to  apply 
the  simplex  method  until  an  optimal  solution  is  reached. 

The  process  of  solving  the  auxiliary  problem  to  find  an  initial  feasible  solution 
is  often  referred  to  as  Phase  /,  whereas  the  process  of  going  from  a  feasible  solution 
to  an  optimal  solution  is  called  Phase  II. 

4.  Unboundedness 

In  this  section,  we  discuss  how  to  detect  when  the  objective  function  value  is 
unbounded. 

Let  us  now  take  a  closer  look  at  the  “leaving  variable”  computation:  pick  l  from 
{i  G  B  :  aik/bi  is  maximal}.  We  avoided  the  issue  before,  but  now  we  must  face 
what  to  do  if  a  denominator  in  one  of  these  ratios  vanishes.  If  the  numerator  is 
nonzero,  then  it  is  easy  to  see  that  the  ratio  should  be  interpreted  as  plus  or  minus 
infinity  depending  on  the  sign  of  the  numerator.  For  the  case  of  0/0,  the  correct 
convention  (as  we’ll  see  momentarily)  is  to  take  this  as  a  zero. 

What  if  all  of  the  ratios,  a^/bi,  are  nonpositive?  In  that  case,  none  of  the  basic 
variables  will  become  zero  as  the  entering  variable  increases.  Hence,  the  entering 
variable  can  be  increased  indefinitely  to  produce  an  arbitrarily  large  objective  value. 
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In  such  situations,  we  say  that  the  problem  is  unbounded .  For  example,  consider  the 
following  dictionary: 

C  =  5  +  X3  -  Xi 
X2  =  5  +  2x3  —  3xi 
X4  =  7  —  4xi 

X5  X\ . 

The  entering  variable  is  X3  and  the  ratios  are 

-2/5,  -0/7,  0/0. 

Since  none  of  these  ratios  is  positive,  the  problem  is  unbounded. 

In  the  next  chapter,  we  will  investigate  what  happens  when  some  of  these  ratios 
take  the  value  +  00. 


5.  Geometry 

When  the  number  of  variables  in  a  linear  programming  problem  is  three  or  less, 
we  can  graph  the  set  of  feasible  solutions  together  with  the  level  sets  of  the  objective 
function.  From  this  picture,  it  is  usually  a  trivial  matter  to  write  down  the  optimal 
solution.  To  illustrate,  consider  the  following  problem: 

maximize  3xi  +2x2 
subject  to  —  x\  +  3x2  <  12 

X\  +  X2  <  8 

2xi  —  X2  <  10 
Xi,  X2  >  0. 

Each  constraint  (including  the  nonnegativity  constraints  on  the  variables)  is  a  half¬ 
plane.  These  half-planes  can  be  determined  by  first  graphing  the  equation  one 
obtains  by  replacing  the  inequality  with  an  equality  and  then  asking  whether  or  not 

*2 


Figure  2.1.  The  set  of  feasible  solutions  together  with  level  sets 
of  the  objective  function. 
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some  specific  point  that  doesn’t  satisfy  the  equality  (often  (0, 0)  can  be  used)  satis¬ 
fies  the  inequality  constraint.  The  set  of  feasible  solutions  is  just  the  intersection  of 
these  half-planes.  For  the  problem  given  above,  this  set  is  shown  in  Figure  2.1.  Also 
shown  are  two  level  sets  of  the  objective  function.  One  of  them  indicates  points  at 
which  the  objective  function  value  is  11.  This  level  set  passes  through  the  middle 
of  the  set  of  feasible  solutions.  As  the  objective  function  value  increases,  the  corre¬ 
sponding  level  set  moves  to  the  right.  The  level  set  corresponding  to  the  case  where 
the  objective  function  equals  22  is  the  last  level  set  that  touches  the  set  of  feasible 
solutions.  Clearly,  this  is  the  maximum  value  of  the  objective  function.  The  optimal 
solution  is  the  intersection  of  this  level  set  with  the  set  of  feasible  solutions.  Hence, 
we  see  from  Figure  2.1  that  the  optimal  solution  is  (xi,  x2)  =  (6,  2). 


Exercises 


Solve  the  following  linear  programming  problems.  If  you  wish,  you  may  check 
your  arithmetic  by  using  the  simple  online  pivot  tool: 


2.1 


2.2 


2.3 


2.4 


2.5 


www.  princeton .  edu/^rvdb/  JAVA/pivot/ simple  .html 

maximize  6x1  +  8x2  +  5x3  +  9x4 
subject  to  2xi  +  X2  +  X3  +  3x4  <  5 
X\  -\~  8x2  ~\~  X3  2x4  ^  3 
Xi,  x2,  x3,  x4  >  0  . 


maximize  2xi  +  x2 
subject  to  2xi  +  x2  <  4 
2xi  +  3x2  <  3 
4xi  +  x2  <  5 
x\  H-  5x2  <  1 
xi,  x2  >  0  . 

maximize  2xi  —  6x2 
subject  to  —  x\  —  x2  —  x3  <  —  2 
2xi  —  X2  +  X3  <  1 

Xi,  x2,  x3  >  0  . 


maximize  —x\—  3x2  —  x3 
subject  to  2xi  —  5x2  +  x3  <  —  5 
2xi  —  x2  +  2x3  <  4 

xi,  x2,  x3  >  0  . 


maximize  x\  +  3x2 
subject  to  —  x\  —  x2  <  —3 
-x1  +  x2  <  -1 
x\  +  2x2  <  4 

xi,  x2  >  0  . 
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2.6 

maximize 

+ 

T— 1 

3x2 

subject  to 

—  X\  — 

x2 

< 

-3 

—  Xi  + 

x2 

< 

-1 

X\  -j- 

2x2 

< 

2 

X\ 

,  x2 

> 

0  . 

2.7 

maximize 

Xi  + 

3x2 

subject  to 

—  Xi  — 

x2 

< 

-3 

—  X\  + 

x2 

< 

-1 

—  Xi  + 

2x2 

< 

2 

Xi 

,  ^2 

> 

0  . 

2.8 

maximize 

3xi  +  . 

2x2 

subject  to 

X\  —  \ 

2x2 

< 

1 

X\  — 

x2 

< 

2 

2xi  — 

£2 

< 

6 

X\ 

< 

5 

2xi  H- 

< 

16 

Xi  + 

x2 

< 

12 

Xi  +  ' 

2x2 

< 

21 

£2 

< 

10 

Xl, 

x2 

> 

0  . 

2.9  maximize  2xi  +  3x2  +  4x3 

subject  to  —  2^2  —  3x3  >  —5 

X\  ~\~  X2  +  2x3  ^  4 

xi  +  2x2  +  3x3  <  7 

Xi,  X2,  X3  >  0  . 

2.10  maximize  6x1  +  8x2  +  5x3  +  9x4 


subject  to 

Xi  + 

X2  +  X3  + 

X4 

=  1 

Xi,  X2,  X3, 

X4 

>  0. 

minimize 

#12  + 

8x13  -f-  9X14  +  2X23  7X24  +  3X34 

subject  to 

X12  + 

#13  +  #14 

> 

1 

-X12 

+ 

#23  + 

X24 

— 

0 

-Xl3 

— 

#23 

+ 

X34 

— 

0 

X14 

+ 

#24  H~ 

X34 

< 

1 

X12 , 

X13,  .  .  . 

,  X34 

> 

0  . 

2.12  Using  today’s  date  (MMYY)  for  the  seed  value,  solve  10  initially  feasible 
problems  using  the  online  pivot  tool: 

www.  princeton .  edu/^rvdb/ JAVA/  pivot/primal .  html 

2.13  Using  today’s  date  (MMYY)  for  the  seed  value,  solve  10  not  necessarily 
feasible  problems  using  the  online  pivot  tool: 

www.princeton.edu/^rvdb/JAVA/pivot/primal_x0.html 
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2.14  Consider  the  following  dictionary: 

C  =  3  +  x\  +  6x2 

IV I  =  1  +  X\  —  X‘2 

W2  =  5  —  2xi  —  3x2  . 

Using  the  largest  coefficient  rule  to  pick  the  entering  variable,  compute 
the  dictionary  that  results  after  one  pivot. 

2.15  Show  that  the  following  dictionary  cannot  be  the  optimal  dictionary  for 
any  linear  programming  problem  in  which  w\  and  W2  are  the  initial  slack 
variables: 

(  =  4  —  w  i  —  2x2 

Xi  =  3  —  2x2 

W2  =  1  +  W\  —  X2  . 

Hint:  if  it  could,  what  was  the  original  problem  from  whence  it  came? 

2.16  Graph  the  feasible  region  for  Exercise  2.8.  Indicate  on  the  graph  the  se¬ 
quence  of  basic  solutions  produced  by  the  simplex  method. 

2.17  Give  an  example  showing  that  the  variable  that  becomes  basic  in  one  iter¬ 
ation  of  the  simplex  method  can  become  nonbasic  in  the  next  iteration. 

2.18  Show  that  the  variable  that  becomes  nonbasic  in  one  iteration  of  the  sim¬ 
plex  method  cannot  become  basic  in  the  next  iteration. 

2.19  Solve  the  following  linear  programming  problem: 

n 

maximize  pj  Xj 

3  = 1 
n 

subject  to  qjXj  <  f3 

3  = 1 

Xj  <1  j  =  1,  2, . . . ,  n 

Xj  >  0  j  =  1,  2, . . . ,  n. 

Here,  the  numbers  pj,  j  =  1,  2, . . . ,  n,  are  positive  and  sum  to  one.  The 
same  is  true  of  the  qf  s: 

n 

= 1 

3  = 1 

Qj  >  0. 

Furthermore  (with  only  minor  loss  of  generality),  you  may  assume  that 

Pi  ^  P2  ^  ^  Pn 

—  <  —  <  *  *  *  <  - . 

Ql  <72  <7n 

Finally,  the  parameter  (3  is  a  small  positive  number.  See  Exercise  1.3  for 
the  motivation  for  this  problem. 
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Notes 

The  simplex  method  was  invented  by  G.B.  Dantzig  in  1949.  His  monograph 
(Dantzig  1963)  is  the  classical  reference.  Most  texts  describe  the  simplex  method 
as  a  sequence  of  pivots  on  a  table  of  numbers  called  the  simplex  tableau.  Follow¬ 
ing  Chvatal  (1983),  we  have  developed  the  algorithm  using  the  more  memorable 
dictionary  notation. 


CHAPTER  3 


Degeneracy 


In  the  previous  chapter,  we  discussed  what  it  means  when  the  ratios  computed 
to  calculate  the  leaving  variable  are  all  nonpositive  (the  problem  is  unbounded).  In 
this  chapter,  we  take  up  the  more  delicate  issue  of  what  happens  when  some  of  the 
ratios  are  infinite  (i.e.,  their  denominators  vanish). 


1.  Definition  of  Degeneracy 

We  say  that  a  dictionary  is  degenerate  if  bi  vanishes  for  some  i  £  B.  A  de¬ 
generate  dictionary  could  cause  difficulties  for  the  simplex  method,  but  it  might  not. 
For  example,  the  dictionary  we  were  discussing  at  the  end  of  the  last  chapter, 


C  =  5  +  X3  -  x\ 
#2  =  5  +  2x3  —  3xi 
X4  =  7  —  4xi 

X5  X\  5 


is  degenerate,  but  it  was  clear  that  the  problem  was  unbounded  and  therefore  no 
more  pivots  were  required.  Furthermore,  had  the  coefficient  of  x3  in  the  equation 
for  X2  been  —2  instead  of  2,  then  the  simplex  method  would  have  picked  X2  for  the 
leaving  variable  and  no  difficulties  would  have  been  encountered. 

Problems  arise,  however,  when  a  degenerate  dictionary  produces  degenerate 
pivots.  We  say  that  a  pivot  is  a  degenerate  pivot  if  one  of  the  ratios  in  the  calculation 
of  the  leaving  variable  is  +00;  i.e.,  if  the  numerator  is  positive  and  the  denominator 
vanishes.  To  see  what  happens,  let’s  look  at  a  few  examples. 


2.  Two  Examples  of  Degenerate  Problems 


Here  is  an  example  of  a  degenerate  dictionary  in  which  the  pivot  is  also 
degenerate: 


(  =  3  —  0.5xi  +  2x2  —  1.5^i 
(3.1)  x3  =  1  —  0.5xi  —  OSywi 

W2=  X‘i  —  X‘2  +  W\. 


For  this  dictionary,  the  entering  variable  is  X2  and  the  ratios  computed  to  determine 
the  leaving  variable  are  0  and  +00.  Hence,  the  leaving  variable  is  W2,  and  the  fact 
that  the  ratio  is  infinite  means  that  as  soon  as  x 2  is  increased  from  zero  to  a  positive 


The  original  version  of  this  chapter  was  revised.  An  erratum  to  this  chapter  can  be  found  at  DOI 
10.1007/978-1-4614-7630-6-26 


R.J.  Vanderbei,  Linear  Programming ,  International  Series  in  Operations  Research 
&  Management  Science  196,  DOI  10.1007/978-l-4614-7630-6_3, 

©  Springer  Science+Business  Media  New  York  2014 
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value,  W2  will  go  negative.  Therefore,  x 2  can’t  really  increase.  Nonetheless,  it  can 
be  reclassified  from  nonbasic  to  basic  (with  W2  going  the  other  way).  Let’s  look  at 
the  result  of  this  degenerate  pivot: 

£  =  3  +  1.5xi  —  2w2  +  0.5u;i 
(3.2)  .x‘3  =  1  —  0.5xi  —  0.5wi 

X‘2  =  Xi  ~  'W2  +  W\. 

Note  that  £  remains  unchanged  at  3.  Hence,  this  degenerate  pivot  has  not  produced 
any  increase  in  the  objective  function  value.  Furthermore,  the  values  of  the  variables 
haven’t  even  changed:  both  before  and  after  this  degenerate  pivot,  they  are 

(x1,x2,x3,w1,w2)  =  (0,0, 1,0,0). 

But  we  are  now  representing  this  solution  in  a  new  way,  and  perhaps  the  next  pivot 
will  make  an  improvement,  or  if  not  the  next  pivot  perhaps  the  one  after  that.  Let’s 
see  what  happens  for  the  problem  at  hand.  The  entering  variable  for  the  next  it¬ 
eration  is  x\  and  the  leaving  variable  is  X3,  producing  a  nondegenerate  pivot  that 
leads  to 

£  =  6  —  3^3  —  2u)2  —  w\ 
x\  =  2  —  2x3  —  w  1 

X‘2  =  2  —  2X3  —  U)2  • 

These  two  pivots  illustrate  what  typically  happens.  When  one  reaches  a  degenerate 
dictionary,  it  is  usual  that  one  or  more  of  the  subsequent  pivots  will  be  degenerate 
but  that  eventually  a  nondegenerate  pivot  will  lead  us  away  from  these  degenerate 
dictionaries.  While  it  is  typical  for  some  pivot  to  “break  away”  from  the  degeneracy, 
the  real  danger  is  that  the  simplex  method  will  make  a  sequence  of  degenerate  pivots 
and  eventually  return  to  a  dictionary  that  has  appeared  before,  in  which  case  the 
simplex  method  enters  an  infinite  loop  and  never  finds  an  optimal  solution.  This 
behavior  is  called  cycling. 

Unfortunately,  under  certain  pivoting  rules,  cycling  is  possible.  In  fact,  it  is 
possible  even  when  using  one  of  the  most  popular  pivoting  rules: 

•  Choose  the  entering  variable  as  the  one  with  the  largest  coefficient  in  the 
(-row  of  the  dictionary. 

•  When  two  or  more  variables  compete  for  leaving  the  basis,  pick  an  x- 
variable  over  a  slack  variable  and,  if  there  is  a  choice,  use  the  variable 
with  the  smallest  subscript.  In  other  words,  reading  left  to  right,  pick  the 
first  leaving- variable  candidate  from  the  list: 

X\-,  X2  5  •  •  •  5  Xni  W\ ,  W2-,  •  •  •  5  rE m • 

However,  it  is  hard  to  find  examples  of  cycling  in  which  m  and  n  are  small.  In  fact, 
it  has  been  shown  that  if  a  problem  has  an  optimal  solution  but  cycles  off-optimum, 
then  the  problem  must  involve  dictionaries  with  at  least  four  (non- slack)  variables 
and  two  constraints.  Here  is  an  example  that  cycles: 

£  =  X\  —  2X2  —  2x4 

w  1  =  —  0.5xi  +  3.5x2  +  2x3  —  4x4 

U)2  =  —  0.5xi  +  X2  +  0.5X3  —  0.5X4 

W3  =  1  —  X\. 
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And  here  is  the  sequence  of  dictionaries  the  above  pivot  rules  produce.  For  the  first 
pivot,  x\  enters  and  w\  leaves  bringing  us  to: 

C  =  —  2W\  A  5^2  +  4x3  —  10X4 

X\  —  —  2W\  A  7X2  +  4x3  —  8X4 

W2  =  Wi  —  2.5X2  —  1.5X3  +  3.5X4 

W3  =  1  +  2wi  —  7x2  —  4x3  +  8x4  . 

For  the  second  iteration,  X2  enters  and  W2  leaves  bringing  us  to: 

(  =  —  2u)2  A  X3  —  3X4 

xi  =  0.8wi  —  2.8W2  —  0.2x3  +  1.8x4 

X2  =  OAwi  —  0AW2  —  0.6x3  +  1.4X4 
W3  =  1  —  0.8wi  A  2.8W2  +  0.2x3  —  1.8x4  . 

For  the  third  iteration,  X3  enters  and  x\  leaves: 

C  =  Aw  1  —  16u)2  —  5xi  +  6x4 

X3  =  Aw  1  —  IAW2  —  5xi  +  9X4 

X2  =  —  2wi  A  811J2  A  3xi  —  4x4 

IV  3  =  1  —  X\. 

For  the  fourth  iteration,  X4  enters  and  X2  leaves: 

C  =  iv  1  —  Aw  2  —  0.5xi  —  1.5x2 

X3  =  —  0.5iei  +  4ie2  +  1.75xi  —  2.25x2 

X4  =  —  0.5iei  +  2 iv 2  A  0.75xi  —  0.25x2 

W3  =  1  —  X\. 

In  the  fifth  iteration,  w\  enters  and  X3  leaves: 

C  =  —  2x3  +  4iE2  A  3xi  —  6x2 

Wi  =  —  2x3  +  8W2  +  3.5xi  —  4.5x2 

X4  =  X3  —  2w2  —  x\  A  2x2 

W3  =  1  —  X\. 

Lastly,  for  the  sixth  iteration,  W2  enters  and  X4  leaves: 

(  =  —  2x4  +  X\  —  2X2 

iv  1  =  2x3  —  4x4  —  0.5xi  +  3.5x2 

iv  2  =  0.5x3  —  0.5x4  —  0.5xi  +  X2 

IV  3  =  1  —  X\. 

Note  that  we  have  come  back  to  the  original  dictionary,  and  so  from  here  on 
the  simplex  method  simply  cycles  through  these  six  dictionaries  and  never  makes 
any  further  progress  toward  an  optimal  solution.  As  bad  as  cycling  is,  the  following 
theorem  tells  us  that  nothing  worse  can  happen: 

Theorem  3.1.  If  the  simplex  method  fails  to  terminate,  then  it  must  cycle. 

PROOF.  A  dictionary  is  completely  determined  by  specifying  which  variables 
are  basic  and  which  are  nonbasic.  There  are  only 

(n  +  m)\ 


n  A  m 
m 


n\m\ 
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different  possibilities.  This  number  is  big,  but  it  is  finite.  If  the  simplex  method  fails 
to  terminate,  it  must  visit  some  of  these  dictionaries  more  than  once.  Hence,  the 
algorithm  cycles.  □ 

Note  that,  if  the  simplex  method  cycles,  then  all  the  pivots  within  the  cycle 
must  be  degenerate.  This  is  easy  to  see,  since  the  objective  function  value  never 
decreases.  Hence,  it  follows  that  all  the  pivots  within  the  cycle  must  have  the  same 
objective  function  value,  i.e.,  all  of  the  these  pivots  must  be  degenerate. 

In  practice,  degeneracy  is  common  because  a  zero  right-hand  side  value  crops 
up  frequently  in  real-world  problems.  Cycling  is  not  as  common,  but  it  can  happen 
and  therefore  must  be  addressed.  Computer  implementations  of  the  simplex  method 
in  which  numbers  are  represented  as  integers  or  as  simple  rational  numbers  are  at 
risk  of  cycling  and  one  of  the  techniques  described  in  the  following  sections  must 
be  used  to  avoid  the  problem.  But,  most  implementations  of  the  simplex  method 
are  written  with  floating  point  numbers  (the  computer  approximation  to  a  full  set  of 
real  numbers).  With  floating  point  computation  there  is  inevitable  round-off  error. 
Hence,  a  zero  appearing  as  a  right-hand  side  value  generally  shows  up  not  as  an 
exact  zero  but  rather  as  a  very  small  number.  The  result  is  that  the  dictionary  appears 
to  be  slightly  off  from  actually  being  degenerate  and  therefore  cycling  is  usually 
avoided. 


3.  The  Perturbation/Lexicographic  Method 

As  we  have  seen,  there  is  not  just  one  algorithm  called  the  simplex  method. 
Instead,  the  simplex  method  is  a  whole  family  of  related  algorithms  from  which 
we  can  pick  a  specific  instance  by  specifying  what  we  have  been  referring  to  as 
pivoting  rules.  We  have  also  seen  that,  using  a  very  natural  pivoting  rule,  the  sim¬ 
plex  method  can  fail  to  converge  to  an  optimal  solution  by  occasionally  cycling 
indefinitely  through  a  sequence  of  degenerate  pivots  associated  with  a  nonoptimal 
solution. 

So  this  raises  a  natural  question:  are  there  pivoting  rules  for  which  the  simplex 
method  will,  with  certainty,  either  reach  an  optimal  solution  or  prove  that  no  such 
solution  exists?  The  answer  to  this  question  is  yes,  and  we  shall  present  two  choices 
of  such  pivoting  rules. 

The  first  method  is  based  on  the  observation  that  degeneracy  is  sort  of  an  ac¬ 
cident.  That  is,  a  dictionary  is  degenerate  if  one  or  more  of  the  hf  s  vanish.  Our 
examples  have  generally  used  small  integers  for  the  data,  and  in  this  case  it  doesn’t 
seem  too  surprising  that  sometimes  cancellations  occur  and  we  end  up  with  a  de¬ 
generate  dictionary.  But  each  right-hand  side  could  in  fact  be  any  real  number,  and 
in  the  world  of  real  numbers  the  occurrence  of  any  specific  number,  such  as  zero, 
seems  to  be  quite  unlikely.  So  how  about  perturbing  a  given  problem  by  adding 
small  random  perturbations  independently  to  each  of  the  right-hand  sides?  If  these 
perturbations  are  small  enough,  we  can  think  of  them  as  insignificant  and  hence  not 
really  changing  the  problem.  If  they  are  chosen  independently,  then  the  probability 
of  an  exact  cancellation  is  zero. 
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Such  random  perturbation  schemes  are  used  in  some  implementations,  but  what 
we  have  in  mind  as  we  discuss  perturbation  methods  is  something  a  little  bit  differ¬ 
ent.  Instead  of  using  independent  identically  distributed  random  perturbations,  let  us 
consider  using  a  fixed  perturbation  for  each  constraint,  with  the  perturbation  getting 
much  smaller  on  each  succeeding  constraint.  Indeed,  we  introduce  a  small  positive 
number  e\  for  the  first  constraint  and  then  a  much  smaller  positive  number  62  for 
the  second  constraint,  etc.  We  write  this  as 

0  <  em  +C  •  •  •  +C  62  +C  ei  <C  all  other  data. 

The  idea  is  that  each  acts  on  an  entirely  different  scale  from  all  the  other  e^’s  and 
the  data  for  the  problem.  What  we  mean  by  this  is  that  no  linear  combination  of 
the  e^’s  using  coefficients  that  might  arise  in  the  course  of  the  simplex  method  can 
ever  produce  a  number  whose  size  is  of  the  same  order  as  the  data  in  the  problem. 
Similarly,  each  of  the  “lower  down”  e^’s  can  never  “escalate”  to  a  higher  level. 
Hence,  cancellations  can  only  occur  on  a  given  scale.  Of  course,  this  complete 
isolation  of  scales  can  never  be  truly  achieved  in  the  real  numbers,  so  instead  of 
actually  introducing  specific  values  for  the  e^’s,  we  simply  treat  them  as  abstract 
symbols  having  these  scale  properties. 

To  illustrate  what  we  mean,  let’s  look  at  a  specific  example.  Consider  the  fol¬ 
lowing  degenerate  dictionary: 


(  =  6  x\  +  4  X2 

w\  =  0  +  9  x\  +  4  X2 
W2  =  0  —  4  x\  —  2  X2 

663  =  1  -  X2- 

The  first  step  is  to  introduce  symbolic  parameters 

0  <  e3  +C  e2  +C  ei 

to  get  a  perturbed  problem: 

C  =  6  x\  +  4  X2 

W\  =  0  +  61  +  9  X\  +  4  X2 

W>2  —  0  +62  —  4  X\  —  2  X2 

663  =  1  +63  -  X2- 

This  dictionary  is  not  degenerate.  The  entering  variable  is  x\  and  the  leaving  vari¬ 
able  is  unambiguously  w 2 .  The  next  dictionary  is 

C=  1.5  62  —  1.5  662+  X2 

661  =  0  +  ei  +  2.25  62  —  2.25  662  —  0.5  X2 

X\  —  0  +0.25  62  —  0.25  662  —  0.5  X2 

663  =  1  +e3  -  x2. 

For  the  next  pivot,  the  entering  variable  is  X2  and,  using  the  fact  that  62  +C  ei,  we 
see  that  the  leaving  variable  is  x\.  The  new  dictionary  is 
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C  = _ 2  e2  -  2  w2  -  2  xi 

W\  =  0  +  6;l  +  2  62  —  2  W  2  +  aq 

.x‘2  =  0  +0.5e2  —  0.5  W2  —  2  x'i 

663  =  1  —  0.5  62  +  63  +  0.5  W2  +  2  X\. 

This  last  dictionary  is  optimal.  At  this  point,  we  simply  drop  the  symbolic 
parameters  and  get  an  optimal  dictionary  for  the  unperturbed  problem: 

(  =  —  2  W2  —  2  x\ 

W\  =  0  —  2  W2  +  Xi 

x‘2  =  0  —  0.5  W2  —  2  x\ 

W3  =  1  +  0.5  W2  +  2  x\ . 


When  treating  the  e^’s  as  symbols,  the  method  is  called  the  lexicographic 
method.  Note  that  the  lexicographic  method  does  not  affect  the  choice  of  entering 
variable  but  does  amount  to  a  precise  prescription  for  the  choice  of  leaving  variable. 

It  turns  out  that  the  lexicographic  method  produces  a  variant  of  the  simplex 
method  that  never  cycles: 

Theorem  3.2.  The  simplex  method  always  terminates  provided  that  the  leaving 
variable  is  selected  by  the  lexicographic  rule. 

Proof.  It  suffices  to  show  that  no  degenerate  dictionary  is  ever  produced.  As 
we’ve  discussed  before,  the  e^’s  operate  on  different  scales  and  hence  can’t  cancel 
with  each  other.  Therefore,  we  can  think  of  the  e^’s  as  a  collection  of  independent 
variables.  Extracting  the  e  terms  from  the  first  dictionary,  we  see  that  we  start  with 
the  following  pattern: 


£2 


On  • 

And,  after  several  pivots,  the  e  terms  will  form  a  system  of  linear  combinations,  say, 


rue  1 

+ 

r  12^2  • • • 

+ 

r  21^1 

+ 

r 22^2  • • • 

•  • 

+ 

r  2  rr+m 

r  ml^l 

•  • 

+  rm2e2  . . . 

+ 

r  mm^m 

Since  this  system  of  linear  combinations  is  obtained  from  the  original  system  by 
pivot  operations  and,  since  pivot  operations  are  reversible,  it  follows  that  the  rank 
of  the  two  systems  must  be  the  same.  Since  the  original  system  had  rank  m,  we  see 
that  every  subsequent  system  must  have  rank  m.  This  means  that  there  must  be  at 
least  one  nonzero  7+  in  every  row  i,  which  of  course  implies  that  none  of  the  rows 
can  be  degenerate.  Hence,  no  dictionary  can  be  degenerate.  □ 
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4.  Bland’s  Rule 

The  second  pivoting  rule  we  consider  is  called  Bland’s  rule.  It  stipulates  that 
both  the  entering  and  the  leaving  variable  be  selected  from  their  respective  sets  of 
choices  by  choosing  the  variable  Xk  with  the  smallest  index  k. 

Theorem  3.3.  The  simplex  method  always  terminates  provided  that  both  the 
entering  and  the  leaving  variable  are  chosen  according  to  Bland’s  rule. 

The  proof  may  look  rather  involved,  but  the  reader  who  spends  the  time  to 
understand  it  will  find  the  underlying  elegance  most  rewarding. 

Proof.  It  suffices  to  show  that  such  a  variant  of  the  simplex  method  never 
cycles.  We  prove  this  by  assuming  that  cycling  does  occur  and  then  showing  that 
this  assumption  leads  to  a  contradiction.  So  let’s  assume  that  cycling  does  occur. 
Without  loss  of  generality,  we  may  assume  that  it  happens  from  the  beginning.  Let 
D0,  D i, . . . ,  Dk~i  denote  the  dictionaries  through  which  the  method  cycles.  That 
is,  the  simplex  method  produces  the  following  sequence  of  dictionaries: 

D0,  D i, . . . ,  Dfc_i,  D0,  D i, - 

We  say  that  a  variable  is  fickle  if  it  is  in  some  basis  and  not  in  some  other  basis. 
Let  xt  be  the  fickle  variable  having  the  largest  index  and  let  D  denote  a  dictionary 
in  Do,  Di, . . . ,  Dk-i  in  which  xt  leaves  the  basis.  Again,  without  loss  of  generality 
we  may  assume  that  D  =  D0.  Let  xs  denote  the  corresponding  entering  variable. 
Suppose  that  D  is  recorded  as  follows: 

(  =  V+J2  CjXj 
jeAf 

Xi  =  hi  —  aijxj  i  E  B. 

jeAf 

Since  xs  is  the  entering  variable  and  xt  is  the  leaving  variable,  we  have  that  s  G  J\f 
and  t  G  B. 

Now  let  D*  be  a  dictionary  in  Di,  D2, . . . ,  D&_ i  in  which  xt  enters  the  basis. 
Suppose  that  D*  is  recorded  as  follows: 

C  =  V*  +  53  CJX3 

(3.3)  jeAf* 

Xi  =  b*  —  aijxj  i  <E  B* . 

jeAf* 

Since  all  the  dictionaries  are  degenerate,  we  have  that  u*  =  v,  and  therefore  we  can 
write  the  objective  function  in  (3.3)  as 

n+m 

(3.4)  c  =  v+y2cjXj, 

3  = 1 

where  we  have  extended  the  notation  c*  to  all  variables  (both  original  and  slack)  by 
setting  c*  =  0  for  j  E  B*. 
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Ignoring  for  the  moment  the  possibility  that  some  variables  could  go  negative, 
consider  the  solutions  obtained  by  letting  xs  increase  while  holding  all  other  vari¬ 
ables  in  AT  at  zero: 


=  V , 

Xj=  o,  j  G  Af\  {s}, 

Xi  =  bi~  aisy,  i  G  B. 

The  objective  function  at  this  point  is  given  by 

C  =  v  +  csy. 

However,  using  (3.4),  we  see  that  it  is  also  given  by 

C  =  v  +  c*sy  +  ^2  ci(h  ~  aisV)- 

ieB 

Equating  these  two  expressions  for  £,  we  see  that 


Since  this  equation  must  be  an  identity  for  every  y,  it  follows  that  the  coefficient 
multiplying  y  must  vanish  (as  must  the  right-hand  side): 

cs  —  c*  +  c*aiS  =  0. 

ieB 

Now,  the  fact  that  xs  is  the  entering  variable  in  D  implies  that 

cs  >  0. 

Recall  that  xt  is  the  fickle  variable  with  the  largest  index.  Since  xs  is  also  fickle,  we 
see  that  s  <  t .  Since  xs  is  not  the  entering  variable  in  D*  (as  xt  is),  we  see  that 

c*8  <  0. 

From  these  last  three  displayed  equations,  we  get 

^  ^  Ci  &is  ^  0. 

ieB 

Hence,  there  must  exist  an  index  r  G  B  for  which 
(3.5)  c*ars  <  0. 

Consequently,  c*  ^  0  and  r  £  A f*.  Hence,  xr  is  fickle  and  therefore  r  <  t.  In 
fact,  r  <  t,  since  c^ats  >  0.  To  see  that  this  product  is  positive,  note  that  both  its 
factors  are  positive:  c *  is  positive,  since  xt  is  the  entering  variable  in  D*,  and  ats  is 
positive,  since  xt  is  the  leaving  variable  in  D. 

The  fact  that  r  <  t  implies  that  c*  <  0  (otherwise,  according  to  the  smallest 
index  criteria,  r  would  be  the  entering  variable  for  D*).  Hence,  (3.5)  implies  that 

ars  >  0. 

Now,  since  each  of  the  dictionaries  in  the  cycle  describe  the  same  solution,  it  follows 
that  every  fickle  variable  is  zero  in  all  these  dictionaries  (since  it  is  clearly  zero  in 
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a  dictionary  in  which  it  is  nonbasic).  In  particular,  xr  =  0.  But  in  D ,  xr  is  basic. 
Hence, 

br  =  0. 

These  last  two  displayed  equations  imply  that  xr  was  a  candidate  to  be  the  leaving 
variable  in  D ,  and  since  r  <  t,  it  should  have  been  chosen  over  xt.  This  is  the 
contradiction  we  have  been  looking  for.  □ 

5.  Fundamental  Theorem  of  Linear  Programming 

Now  that  we  have  a  Phase  I  algorithm  and  a  variant  of  the  simplex  method  that 
is  guaranteed  to  terminate,  we  can  summarize  the  main  points  of  this  chapter  in  the 
following  theorem: 

THEOREM  3.4.  For  an  arbitrary  linear  program  in  standard  form,  the  following 
statements  are  true: 

(1)  If  there  is  no  optimal  solution,  then  the  problem  is  either  infeasible  or 
unbounded. 

(2)  If  a  feasible  solution  exists,  then  a  basic  feasible  solution  exists. 

(3)  If  an  optimal  solution  exists,  then  a  basic  optimal  solution  exists. 

Proof.  The  Phase  I  algorithm  either  proves  that  the  problem  is  infeasible  or 
produces  a  basic  feasible  solution.  The  Phase  II  algorithm  either  discovers  that  the 
problem  is  unbounded  or  finds  a  basic  optimal  solution.  These  statements  depend, 
of  course,  on  applying  a  variant  of  the  simplex  method  that  does  not  cycle,  which 
we  now  know  to  exist.  □ 


6.  Geometry 

As  we  saw  in  the  previous  chapter,  the  set  of  feasible  solutions  for  a  problem 
in  two  dimensions  is  the  intersection  of  a  number  of  halfplanes,  i.e.,  a  polygon. 
In  three  dimensions,  the  situation  is  similar.  Consider,  for  example,  the  following 
problem: 


maximize  x\  +  2x2  +  3^3 
,  subject  to  x\  +  2x3  <  3 

(3’6)  x2  +  2x3  <  2 

Xi,  X2,  X3  >  0  . 

The  set  of  points  satisfying  x\  +  2x3  =  3  is  a  plane.  The  inequality  x\  +  2x3  <  3 
therefore  consists  of  all  points  on  one  side  of  this  plane;  that  is,  it  is  a  half  space. 
The  same  is  true  for  each  of  the  other  four  inequalities.  The  feasible  set  consists 
of  those  points  in  space  that  satisfy  all  five  inequalities,  i.e.,  those  points  lying  in 
the  intersection  of  these  halfspaces.  This  set  is  the  polyhedron  shown  in  Figure  3.1. 
This  polyhedron  is  bordered  by  fi xe  facets,  each  facet  being  a  portion  of  one  of  the 
planes  that  was  defined  by  replacing  a  constraint  inequality  with  an  equation.  For 
example,  the  “front”  facet  in  the  figure  is  a  portion  of  the  plane  x\  +  2x3  =  3. 
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Figure  3.1.  The  set  of  feasible  solutions  for  the  problem  given 
by  (3.6). 


The  facets  acquire  a  particularly  simple  description  if  we  introduce  slack  variables 
into  the  problem: 

wi  =  3  —  x\  —  2x% 

W2  =  2  —  X2  —  2X3  • 

Indeed,  each  facet  corresponds  precisely  to  some  variable  (either  original  or  slack) 
vanishing.  For  instance,  the  front  facet  in  the  figure  corresponds  to  w\  =0  whereas 
the  “left”  facet  corresponds  to  x 2  =  0. 

The  correspondences  can  be  continued.  Indeed,  each  edge  of  the  polyhedron 
corresponds  to  a  pair  of  variables  vanishing.  For  example,  the  edge  lying  at  the 
interface  of  the  left  and  the  front  facets  in  the  figure  corresponds  to  both  w\  =  0  and 

.X‘2  =  0. 

Going  further  yet,  each  vertex  of  the  polyhedron  corresponds  to  three  variables 
vanishing.  For  instance,  the  vertex  whose  coordinates  are  (1,0, 1)  corresponds  to 
wi  =  0,  X2  =  0,  and  W2  =  0. 

Now,  let’s  think  about  applying  the  simplex  method  to  this  problem.  Every 
basic  feasible  solution  involves  two  basic  variables  and  three  nonbasic  variables. 
Furthermore,  the  three  nonbasic  variables  are,  by  definition,  zero  in  the  basic  fea¬ 
sible  solution.  Therefore,  for  this  example,  the  basic  feasible  solutions  stand  in 
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Figure  3.2.  The  set  of  feasible  solutions  for  the  (degenerate) 
problem  given  by  (3.7). 

one-to-one  correspondence  with  the  vertices  of  the  polyhedron.  In  fact,  applying 
the  simplex  method  to  this  problem,  one  discovers  that  the  sequence  of  vertices 
visited  by  the  algorithm  is 

(0,0,0)  — >  (0,0,1)  — >  (1,0,1)  — >  (3,2,0). 

The  example  we’ve  been  considering  has  the  nice  property  that  every  vertex 
is  formed  by  the  intersection  of  exactly  three  of  the  facets.  But  consider  now  the 
following  problem: 

maximize  x\  +  2x2  +  3^3 
subject  to  x\  +  2x3  <  2 

(3'7)  +  2x3  <  2 

Xl,  X2,  Xs  >  0  . 

Algebraically,  the  only  difference  between  this  problem  and  the  previous  one  is  that 
the  right-hand  side  of  the  first  inequality  is  now  a  2  instead  of  a  3.  But  look  at 
the  polyhedron  of  feasible  solutions  shown  in  Figure  3.2.  The  vertex  (0,  0, 1)  is 
at  the  intersection  of  four  of  the  facets,  not  three  as  one  would  “normally”  expect. 
This  vertex  does  not  correspond  to  one  basic  feasible  solution;  rather,  there  are  four 
degenerate  basic  feasible  solutions,  each  representing  it.  We’ve  seen  two  of  them 
before.  Indeed,  dictionaries  (3.1)  and  (3.2)  correspond  to  two  of  these  degenerate 
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dictionaries  (in  fact,  dictionary  (3.1)  is  the  dictionary  one  obtains  after  one  pivot  of 
the  simplex  method  applied  to  problem  (3.7)). 

We  end  by  considering  the  geometric  effect  of  the  perturbation  method  for  re¬ 
solving  degeneracy.  By  perturbing  the  right-hand  sides,  one  moves  the  planes  that 
determine  the  facets.  If  the  moves  are  random  or  chosen  with  vastly  different  mag¬ 
nitudes  (all  small),  then  one  would  expect  that  each  vertex  in  the  perturbed  problem 
would  be  determined  by  exactly  three  planes.  That  is,  degenerate  vertices  from  the 
original  problem  get  split  into  multiple  nearby  vertices  in  the  perturbed  problem. 
For  example,  problem  (3.6)  can  be  thought  of  as  a  perturbation  of  degenerate  prob¬ 
lem  (3.7)  (the  perturbation  isn’t  small,  but  it  also  isn’t  so  large  as  to  obscure  the 
effect).  Note  how  the  degenerate  vertex  in  Figure  3.2  appears  as  two  vertices  in 
Figure  3.1. 


Exercises 

3.1  Solve  the  following  linear  program  using  the  perturbation  method  to  re¬ 
solve  degeneracy: 

maximize  10xi  —  57x2  —  9x3  —  24x4 

subject  to  0.5xi  —  5.5^2  —  2.5x3  +  9x4  <  0 

0.5xi  —  1.5x2  —  0.5x3  +  X4  <  0 

Xi  <1 

xi,  x2,  x3,  x4  >  0  . 

Note:  The  simple  pivot  tool  with  the  Lexicographic  labels  can  be  used  to 
check  your  arithmetic: 

www.princeton.edu/^rvdb/JAVA/pivot/simple.html 

3.2  Solve  the  following  linear  program  using  Bland’s  rule  to  resolve  degener¬ 
acy: 

maximize  10xi  —  57x2  —  9x3  —  24x4 

subject  to  0.5xi  —  5.5x2  —  2.5x3  +  9x4  <  0 

0.5xi  —  1.5x2  —  0.5x3  +  x4  <  0 

X\  <1 

Xi,  x2,  x3,  x4  >  0  . 

3.3  Using  today’s  date  (MMYY)  for  the  seed  value,  solve  10  possibly  degen¬ 
erate  problems  using  the  online  pivot  tool: 

www.princeton.edu/^rvdb/JAVA/pivot/lexico.html 

3.4  Consider  the  linear  programming  problems  whose  right-hand  sides  are 
identically  zero: 

n 

maximize  E  CjXj 

3  = 1 
n 

aijXj  <0  i  =  1,  2, . . . ,  m 

3  = 1 

Xj  >  0  j  =  1,  2, . . . ,  n. 


subject  to 


NOTES 


37 


Show  that  either  Xj  =  0  for  all  j  is  optimal  or  else  the  problem  is 
unbounded. 

3.5  Consider  the  following  linear  program: 

maximize  x\  +  3^2 
subject  to  — 2xi  <  —5 

x\  >  0. 

Show  that  this  problem  has  feasible  solutions  but  no  vertex  solutions.  How 
does  this  reconcile  with  the  fundamental  theorem  of  linear  programming 
(Theorem  3.4)? 

3.6  Suppose  that  a  linear  programming  problem  has  the  following  property: 
its  initial  dictionary  is  not  degenerate  and,  when  solved  by  the  simplex 
method,  there  is  never  a  tie  for  the  choice  of  leaving  variable. 

(a)  Can  such  a  problem  have  degenerate  dictionaries?  Explain. 

(b)  Can  such  a  problem  cycle?  Explain. 

3.7  Consider  the  following  dictionary: 

C  =  5  ~b  2X2  —  2X3  T  3x5 

x6  =  4  —  2x2  -  x3  +  x5 

X‘4  =  2  —  X2  +  X3  —  X5 

x\  =  6  —  2x2  —  2x3  —  3x5  . 

(a)  List  all  pairs  (xr,  xs)  such  that  xr  could  be  the  entering  variable  and 
xs  could  be  the  leaving  variable. 

(b)  List  all  such  pairs  if  the  largest-coefficient  rule  for  choosing  the  en¬ 
tering  variable  is  used. 

(c)  List  all  such  pairs  if  Bland’s  rule  for  choosing  the  entering  and  leav¬ 
ing  variables  is  used. 


Notes 

The  first  example  of  cycling  was  given  by  Hoffman  (1953).  The  fact  that  any 
linear  programming  problem  that  cycles  must  have  at  least  six  variables  and  three 
constraints  was  proved  by  Marshall  and  Suurballe  (1969). 

Early  proofs  of  the  fundamental  theorem  of  linear  programming  (Theorem  3.4) 
were  constructive,  relying,  as  in  our  development,  on  the  existence  of  a  variant  of 
the  simplex  method  that  works  even  in  the  presense  of  degeneracy.  Hence,  finding 
such  variants  occupied  the  attention  of  early  researchers  in  linear  programming.  The 
perturbation  method  was  first  suggested  by  A.  Orden  and  developed  independently 
by  Charnes  (1952).  The  essentially  equivalent  lexicographic  method  first  appeared 
in  Dantzig  et  al.  (1955).  Theorem  3.3  was  proved  by  Bland  (1977). 

Lor  an  extensive  treatment  of  degeneracy  issues  see  Gal  (1994). 


CHAPTER  4 


Efficiency  of  the  Simplex  Method 


In  the  previous  chapter,  we  saw  that  the  simplex  method  (with  appropriate 
pivoting  rules  to  guarantee  no  cycling)  will  solve  any  linear  programming  prob¬ 
lem  for  which  an  optimal  solution  exists.  In  this  chapter,  we  investigate  just  how 
fast  it  will  solve  a  problem  of  a  given  size. 

1.  Performance  Measures 

Performance  measures  can  be  broadly  divided  into  two  types: 

•  Worst  case 

•  Average  case. 

As  its  name  implies,  a  worst-case  analysis  looks  at  all  problems  of  a  given  “size”  and 
asks  how  much  effort  is  needed  to  solve  the  hardest  of  these  problems.  Similarly, 
an  average-case  analysis  looks  at  the  average  amount  of  effort,  averaging  over  all 
problems  of  a  given  size.  Worst-case  analyses  are  generally  easier  than  average-case 
analyses.  The  reason  is  that,  for  worst-case  analyses,  one  simply  needs  to  give  an 
upper  bound  on  how  much  effort  is  required  and  then  exhibit  a  specific  example  that 
attains  this  bound.  However,  for  average-case  analyses,  one  must  have  a  stochastic 
model  of  the  space  of  “random  linear  programming  problems”  and  then  be  able  to 
say  something  about  the  solution  effort  averaged  over  all  the  problems  in  the  sample 
space.  There  are  two  serious  difficulties  here.  The  first  is  that  it  is  not  clear  at  all  how 
one  should  model  the  space  of  random  problems.  Secondly,  given  such  a  model,  one 
must  be  able  to  evaluate  the  amount  of  effort  required  to  solve  every  problem  in  the 
sample  space. 

Therefore,  worst-case  analysis  is  more  tractable  than  average-case  analysis,  but 
it  is  also  less  relevant  to  a  person  who  needs  to  solve  real  problems.  In  this  chapter, 
we  will  start  by  giving  a  detailed  worst-case  analysis  of  the  simplex  method  using 
the  largest-coefficient  rule  to  select  the  entering  variable.  We  will  then  present  and 
discuss  the  results  of  some  empirical  studies  in  which  millions  of  linear  program¬ 
ming  problems  were  generated  randomly  and  solved  by  the  simplex  method.  Such 
studies  act  as  a  surrogate  for  a  true  average-case  analysis. 

2.  Measuring  the  Size  of  a  Problem 

Before  looking  at  worst  cases,  we  must  discuss  two  issues.  First,  how  do  we 
specify  the  size  of  a  problem?  Two  parameters  come  naturally  to  mind:  m  and  n. 
Often,  we  simply  use  these  two  numbers  to  characterize  the  size  a  problem.  How¬ 
ever,  we  should  mention  some  drawbacks  associated  with  this  choice.  First  of  all, 
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it  would  be  preferable  to  use  only  one  number  to  indicate  size.  Since  the  data  for 
a  problem  consist  of  the  constraint  coefficients  together  with  the  right-hand  side 
and  objective  function  coefficients,  perhaps  we  should  use  the  total  number  of  data 
elements,  which  is  roughly  mn. 

The  product  mn  isn’t  bad,  but  what  if  many  or  even  most  of  the  data  elements 
are  zero?  Wouldn’t  one  expect  such  a  problem  to  be  easier  to  solve?  Efficient  im¬ 
plementations  do  indeed  take  advantage  of  the  presence  of  lots  of  zeros,  and  so  an 
analysis  should  also  account  for  this.  Hence,  a  good  measure  might  be  simply  the 
number  of  nonzero  data  elements.  This  would  definitely  be  an  improvement,  but  one 
can  go  further.  On  a  computer,  floating-point  numbers  are  all  the  same  size  and  can 
be  multiplied  in  the  same  amount  of  time.  But  if  a  person  is  to  solve  a  problem  by 
hand  (or  use  unlimited  precision  computation  on  a  computer),  then  certainly  multi¬ 
plying  23  by  7  is  a  lot  easier  than  multiplying  23,453.2352  by  86,833.245643.  So 
perhaps  the  best  measure  of  a  problem’s  size  is  not  the  number  of  data  elements,  but 
the  actual  number  of  bits  needed  to  store  all  the  data  on  a  computer.  This  measure 
is  popular  among  most  computer  scientists  and  is  usually  denoted  by  L. 

However,  with  a  little  further  abstraction,  the  size  of  the  data,  L ,  is  seen  to  be 
ambiguous.  As  we  saw  in  Chapter  1,  real-world  problems,  while  generally  large 
and  sparse,  usually  can  be  described  quite  simply  and  involve  only  a  small  amount 
of  true  input  data  that  gets  greatly  expanded  when  setting  the  problem  up  with  a 
constraint  matrix,  right-hand  side,  and  objective  function.  So  should  L  represent 
the  number  of  bits  needed  to  specify  the  nonzero  constraint  coefficients,  objective 
coefficients,  and  right-hand  sides,  or  should  it  be  the  number  of  bits  in  the  original 
data  set  plus  the  number  of  bits  in  the  description  of  how  this  data  represents  a  linear 
programming  problem?  No  one  currently  uses  this  last  notion  of  problem  size,  but 
it  seems  fairly  reasonable  that  they  should  (or  at  least  that  they  should  seriously 
consider  it).  Anyway,  our  purpose  here  is  merely  to  mention  that  these  important 
issues  are  lurking  about,  but,  as  stated  above,  we  shall  simply  focus  on  m  and  n  to 
characterize  the  size  of  a  problem. 

3.  Measuring  the  Effort  to  Solve  a  Problem 

The  second  issue  to  discuss  is  how  one  should  measure  the  amount  of  work 
required  to  solve  a  problem.  The  best  answer  is  the  number  of  seconds  of  computer 
time  required  to  solve  the  problem,  using  the  computer  sitting  on  one’s  desk.  Un¬ 
fortunately,  there  are  (hopefully)  many  readers  of  this  text,  not  all  of  whom  use  the 
exact  same  computer.  Even  if  they  did,  computer  technology  changes  rapidly,  and 
a  few  years  down  the  road  everyone  would  be  using  something  entirely  different. 
It  would  be  nice  if  the  National  Institute  of  Standards  and  Technology  (the  govern¬ 
ment  organization  in  charge  of  setting  standards,  such  as  how  many  threads/inch  a 
standard  light  bulb  should  have)  would  identify  a  standard  computer  for  the  purpose 
of  benchmarking  algorithms,  but,  needless  to  say,  this  is  not  very  likely.  So  the 
time  needed  to  solve  a  problem,  while  the  most  desirable  measure,  is  not  the  most 
practical  one  here.  Fortunately,  there  is  a  fairly  reasonable  substitute.  Algorithms 
are  generally  iterative  processes,  and  the  time  to  solve  a  problem  can  be  factored 


4.  WORST-CASE  ANALYSIS  OF  THE  SIMPLEX  METHOD 


41 


into  the  number  of  iterations  required  to  solve  the  problem  times  the  amount  of  time 
required  to  do  each  iteration.  The  first  factor,  the  number  of  iterations,  does  not 
depend  on  the  computer  and  so  is  a  reasonable  surrogate  for  the  actual  time.  This 
surrogate  is  useful  when  comparing  various  algorithms  within  the  same  general  class 
of  algorithms,  in  which  the  time  per  iteration  can  be  expected  to  be  about  the  same 
among  the  algorithms;  however,  it  becomes  meaningless  when  one  wishes  to  com¬ 
pare  two  entirely  different  algorithms.  For  now,  we  shall  measure  the  amount  of 
effort  to  solve  a  linear  programming  problem  by  counting  the  number  of  iterations, 
i.e.  pivots,  needed  to  solve  it. 


4.  Worst- Case  Analysis  of  the  Simplex  Method 


How  bad  can  the  simplex  method  be  in  the  worst  case?  Well,  we  have  already 
seen  that  for  some  pivoting  rules  it  can  cycle,  and  hence  the  worst-case  solution 
time  for  such  variants  is  infinite.  However,  what  about  noncycling  variants  of  the 
simplex  method?  Since  the  simplex  method  operates  by  moving  from  one  basic 
feasible  solution  to  another  without  ever  returning  to  a  previously  visited  solution, 
an  upper  bound  on  the  number  of  iterations  is  simply  the  number  of  basic  feasible 
solutions,  of  which  there  can  be  at  most 

(  n  +  m  \ 

y  m  J 


For  a  fixed  value  of  the  sum  n  +  m,  this  expression  is  maximized  when  m 
And  how  big  is  it?  It  is  not  hard  to  show  that 


—  22n  < 
2  n 


A  /  o2n 

n  J  ~ 


n. 


(see  Exercise  4.9).  It  should  be  noted  that,  even  though  typographically  compact, 
the  expression  22n  is  huge  even  when  n  is  not  very  big.  For  example,  for  n  =  25, 
we  have  250  =  1.1259  x  1015. 

Our  best  chance  for  finding  a  bad  example  is  to  look  at  the  case  where  m  —  n. 
In  1972,  V.  Klee  and  G.J.  Minty  were  the  first  to  discover  an  example  in  which  the 
simplex  method  using  the  largest  coefficient  rule  requires  2n  —  1  iterations  to  solve. 
The  example  is  quite  simple  to  state: 


n 


maximize 


E10 

3  = 1 


n-j 


Xj 


(4.1) 


i— 1 

2^10 i~jxj+xi<  lOO^1 

i  =  1,  2, . . . ,  n 

3  = 1 

Xj  >  0 

j  =  1,2, 

It  is  instructive  to  look  closely  at  the  constraints.  The  first  three  constraints  are 


xi  <  1 

20^1  +  X2  <  100 

200xi  -j-  20^2  H-  xs  <  10,000. 
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The  first  constraint  simply  says  that  x\  is  no  bigger  than  one.  With  this  in  mind, 
the  second  constraint  says  that  X2  has  an  upper  bound  of  about  100,  depending 
on  how  big  x\  is.  Similarly,  the  third  constraint  says  that  x%  is  roughly  no  bigger 
than  10,000  (again,  this  statement  needs  some  adjustment  depending  on  the  sizes 
of  x\  and  x2).  Therefore,  the  constraints  are  approximately  just  a  set  of  upper 
bounds,  which  means  that  the  feasible  region  is  virtually  a  stretched  n-dimensional 
hypercube l: 


0  < 

X\ 

<  1 

0  < 

x2 

<  100 

0  < 

<  100n_1 

For  this  reason,  the  feasible  region  for  the  Klee-Minty  problem  is  often  referred  to 
as  the  Klee-Minty  cube.  An  n-dimensional  hypercube  has  2n  vertices,  and,  as  we 
shall  see,  the  simplex  method  with  the  largest-coefficient  rule  will  start  at  one  of 
these  vertices  and  visit  every  vertex  before  finally  finding  the  optimal  solution. 

In  order  to  gain  a  deeper  understanding  of  the  Klee-Minty  problem,  we  first 
replace  the  specific  right-hand  sides,  1002-1,  with  more  generic  values,  bi ,  having 
the  property  that 

1  =  61  <  b2  <  •  •  •  <  bn. 

As  in  the  previous  chapter,  we  use  the  expression  a  b  to  mean  that  a  is  so  much 
smaller  than  b  that  no  factors  multiplying  a  and  dividing  b  that  arise  in  the  course  of 
applying  the  simplex  method  to  the  problem  at  hand  can  ever  make  the  resulting  a 
as  large  as  the  resulting  b.  Hence,  we  can  think  of  the  bi  s  as  independent  variables 
for  now  (specific  values  can  be  chosen  later).  Next,  it  is  convenient  to  change  each 
right-hand  side  replacing  bi  with 


i — 1 

^  10 r-ibj  +  bi. 

3  = 1 

Since  the  numbers  bj,  j  =  1,  2, . . . ,  i  —  1  are  “small  potatoes”  compared  with  bi , 
this  modification  to  the  right-hand  sides  amounts  to  a  very  small  perturbation.  The 
right-hand  sides  still  grow  by  huge  amounts  as  i  increases.  Finally,  we  wish  to  add  a 
constant  to  the  objective  function  so  that  our  generalized  Klee-Minty  problem  can 
finally  be  written  as 


n 


maximize 


(4.2) 


E10 

3= 1 

i — 1 


n-3 


1 


n 


Xn 


E  1Qn~jb: 


3  = 1 


i —  1 


subject  to  2  lCf  J Xj  +  Xi  <  lCf  J  bj  +  bi 


3= 1 


3= 1 


Xj  >  0 


i  =  1,  2, . . . ,  n 
j  =  1,  2, . . . ,  n. 


1 


More  precisely,  a  hyperrectangle. 
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In  Exercise  4.7,  you  are  asked  to  prove  that  this  problem  takes  2n  —  1  iterations. 
To  start  to  get  a  handle  on  the  proof,  here  are  the  seven  iterations  that  one  gets  with 
n  —  3.  The  initial  dictionary  is 

C  ~^pfri  ^2  ~  +  lOQgi  +  10x2  +  x3 

w\  =  b\  —  x\ 

U)2  =  106i  +62  —  20xi  —  X2 

W3  =  IOO61  +  IO62  +  63  —  200xi  —  20x2  —  £3, 

which  is  feasible.  Using  the  largest  coefficient  rule,  the  entering  variable  is  x\.  From 
the  fact  that  each  subsequent  bi  is  huge  compared  with  its  predecessor  it  follows  that 
wi  is  the  leaving  variable.  After  the  first  iteration,  the  dictionary  reads 

C  -  ^fb2  -  \b$  -  lOOuq  +  10x2  +  x3 

x\  —  b\  —  iv  1 

iv  2  =  —  10bi  +62  +  20ivi  —  X2 

W3  =  —  IOO61  +  IO62  +  bs  +  200wi  —  20x2  —  X3. 

Now,  X2  enters  and  W2  leaves,  so  after  the  second  iteration  we  get: 

C  —  +  ^-b2  -  \bz  +  lOOici  -  10 w2  +  x3 

x\  =  b\  —  wi 

.x‘2  =  —  IO61  +62  +  20wi  —  W2 

IV3  =  IOO61  —  IO62  +  63  —  200rci  +  20w2  —  X3. 

After  the  third  iteration 

C  +  ^fb2  -  \bz  -  100xi  -  10tc2  +  x3 

w\  =  b\  —  x\ 

X2  =  10bi  +62  —  20xi  —  iv  2 

W3  =  —  IOO61  —  IO62  +  bs  +  200xi  +  20w2  —  X3. 

After  the  fourth  iteration 

C  ^-b2  +  \bz  +  100xi  +  10 w2  -  W3 

iv  1  =  bi  —  X\ 

X2  =  10bi  +62  —  20xi  —  1V2 

X3  =  —  IOO61  —  IO62  +  ^3  +  200xi  +  20u;2  —  W3. 

After  the  fifth  iteration 

C  -  ^2  +  \bz  lOOiCi  +  10 w2  -  W3 

x\  =  b\  —  iv  1 

X2  =  —10b  1  +  62  +  20?ci  —  ic  2 

X3  =  100bi  —  IO62  -f  63  —  200rci  +  20u;2  —  re  3 . 

After  the  sixth  iteration 

C  —  +  ^-b2  +  ^3  +  lOOrci  10x2  W3 

x\  —  b\  —  iv  1 

W2  =  —  IO61  +62  +  20w\  —  X2 

X3  =  —  IOO61  +  IO62  +  bs  +  200rci  —  20^2  —  W3 . 
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And,  finally,  after  the  seventh  iteration,  we  get 

C  +  ^fb2  +  \bz  100a;  i  10x2  -  w3 

w\  =  b\  —  x\ 

w2  =  106i  +62  —  20xi  —  £2 

£3  =  IOO61  +  IO62  +  bs  —  200xi  —  20^2  —  W3, 

which  is,  of  course,  optimal. 

A  few  observations  should  be  made.  First,  every  pivot  is  the  swap  of  an  Xj  with 
the  corresponding  wj .  Second,  every  dictionary  looks  just  like  the  first  one  with  the 
exception  that  the  Wi  s  and  the  xi  s  have  become  intertwined  and  various  signs  have 
changed  (see  Exercise  4.6). 

Also  note  that  the  final  dictionary  could  have  been  reached  from  the  initial 
dictionary  in  just  one  pivot  if  we  had  selected  £3  to  be  the  entering  variable.  But 
the  largest-coefficient  rule  dictated  selecting  x\.  It  is  natural  to  wonder  whether  the 
largest-coefficient  rule  could  be  replaced  by  some  other  pivot  rule  for  which 
the  worst-case  behavior  would  be  much  better  than  the  2n  behavior  of  the  largest- 
coefficient  rule.  So  far  no  one  has  found  such  a  pivot  rule.  However,  no  one  has 
proved  that  such  a  rule  does  not  exist  either. 

Finally,  we  mention  that  one  desirable  property  of  an  algorithm  is  that  it  be 
scale  invariant.  This  means  that  should  the  units  in  which  one  measures  the  decision 
variables  in  a  problem  be  changed,  the  algorithm  would  still  behave  in  exactly  the 
same  manner.  The  simplex  method  with  the  largest-coefficient  rule  is  not  scale 
invariant.  To  see  this,  consider  changing  variables  in  the  Klee-Minty  problem  by 
putting 

Xj  =  100^~1Xj. 

In  the  new  variables,  the  initial  dictionary  for  the  n  =  3  Klee-Minty  problem 
becomes 

c  =  -  -  ^62-|63  +  100xi  +  1000X2  +  10000X3 

w\  =  b\  —  x\ 

W2  =  10&i  +62  -  20xi  —  X2 

W3  =  IOO61  +  IOOO62  +  63  —  200xi  —  2000x2  —  10000x3. 

Now,  the  largest-coefficient  rule  picks  variable  £3  to  enter.  Variable  W3  leaves,  and 
the  method  steps  to  the  optimal  solution  in  just  one  iteration.  There  exist  pivot  rules 
for  the  simplex  method  that  are  scale  invariant.  But  Klee-Minty-like  examples  have 
been  found  for  most  proposed  alternative  pivot  rules  (whether  scale  invariant  or  not). 
In  fact,  it  is  an  open  question  whether  there  exist  pivot  rules  for  which  one  can  prove 
that  no  problem  instance  requires  an  exponential  number  of  iterations  (as  a  function 
of  m  or  n). 

5.  Empirical  Average  Performance  of  the  Simplex  Method 

To  investigate  the  empirical  average  case  performance  of  the  simplex  method, 
we  generated  a  large  number  of  random  linear  programming  problems  and  solved 
each  of  them  using  the  simplex  method.  There  are  many  ways  to  generate  random 
problems.  In  this  section,  we  consider  one  such  method  and  analyze  the  results. 
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As  we  have  so  dramatically  seen,  the  number  of  iterations  can  depend  on  the 
specific  choice  of  pivot  rule.  In  the  code  discussed  below,  we  choose  the  entering 
variable  to  be  the  one  with  the  largest  coefficient.  The  way  we  generate  random 
problems  makes  it  unlikely  that  there  will  be  ties  in  the  choice  of  largest  coeffi¬ 
cient  (at  least  after  the  first  pivot).  So,  even  though  the  program  makes  a  choice, 
it  is  not  terribly  important  to  articulate  what  that  choice  is.  Similarly,  there  is  little 
chance  for  a  tie  when  choosing  a  leaving  variable,  so  we  do  not  dwell  on  this  matter 
either. 

We  list  below  the  source  code.  The  program  is  written  in  a  language  called 
matlab — a  widely  used  language  whose  source  code  is  fairly  easy  to  read,  even 
for  those  not  familiar  with  the  language.  So,  to  get  started,  here’s  how  we  initialize 
the  data  describing  a  linear  programming  problem: 

m  =  round (10*exp (log (100) *rand ())) ; 

n  =  round (10*exp (log (10  0) *rand ()))  ; 

sigma  =  10; 

A  =  round ( sigma* ( randn (m, n) )) ; 

b  =  round (sigma*abs (randn (m, 1) )) ; 

c  =  round ( sigma* randn ( 1 ,  n)  )  ; 

Here,  rand  ( )  generates  a  (pseudo)  random  number  uniformly  distributed  on 
the  interval  [0, 1]  and  round  ( )  simply  rounds  a  number  to  its  nearest  integer  value. 
The  formulas  for  m  and  n  produce  numbers  between  10  and  1,000.  The  formula 
may  seem  more  complicated  than  one  would  expect.  For  example,  one  might  sug¬ 
gest  this  simpler  formula:  m  =  round  (10  +  9  9  0  *  rand  ( )  ) .  There  is  a  good 
reason  for  the  more  complicated  version.  We  would  like  about  half  of  the  problems 
to  be  between  10  and  100  and  the  other  half  to  be  between  100  and  1,000.  Using  the 
simple  scheme  suggested  above,  only  about  10  %  of  the  numbers  would  be  between 
10  and  100.  The  vast  majority  would  be  between  100  and  1,000.  So,  what  we  want 
is  to  have  our  numbers  uniformly  distributed  when  viewed  on  a  logarithmic  scale. 
Our  formula  for  m  and  n  achieves  this  logarithmically-uniform  distribution. 

The  vector  c  of  objective  function  coefficients  is  generated  using  the  function 
randn  ( ) .  This  function  is  like  rand  ( )  but,  instead  of  generating  numbers  with  a 
uniform  distribution,  it  generates  numbers  with  a  Gaussian  (aka  “normal”)  distribu¬ 
tion  with  mean  0  and  standard  deviation  1.  Multiplying  such  a  variable  by  sigma, 
which  is  set  to  10,  increases  the  standard  deviation  to  10.  The  arguments  (1 ,  n) 
passed  to  randn  ( )  tells  the  random  number  generator  not  to  produce  just  one  such 
number  but  rather  to  produce  a  1  x  n  matrix,  i.e.  a  row  vector,  of  independent 
instances  of  these  random  variables.  There  is  no  particular  reason  to  round  the  co¬ 
efficients  in  the  row  vector  c  to  be  integers.  The  only  reason  this  was  done  was  to 
make  the  random  problems  seem  slightly  more  realistic  since  many /most  real-world 
problems  involve  data  that  is  mostly  integer-valued.  The  matrix  A  and  the  right- 
hand  side  vector  b  are  generated  in  a  manner  similar  to  how  c  is  generated.  But, 
note  that  the  formula  for  b  involves  the  abs  ( )  which  returns  absolute  values  so 
that  all  elements  of  b  are  non-negative.  This  minor,  but  important,  twist  is  done  to 
ensure  that  the  starting  dictionary  is  feasible.  Here’s  the  main  pivot  loop: 
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iter  =  0; 

while  max(c)  >  eps, 

%  pick  largest  coefficient 
[c j ,  col]  =  max (c)  ; 

Acol  =  A ( : , col ) ; 

%  select  leaving  variable 
if  sum (Acol< -eps )  ==  0, 

opt  =  -1;  %  unbounded 

' unbounded' 
break ; 

end 

nums  =  b . * (Acol < -eps ) ; 
dens  =  -Acol . * (Acol< -eps ) ; 

[t,  row]  =  min (nums . /dens ) ; 

Arow  =  A ( row, : ) ; 

a  =  A (row, col) ;  %  pivot  element 

A  =  A  -  Acol*Arow/a; 

A (row,:)  =  -Arow/a; 

A ( : , col )  =  Acol /a; 

A (row, col)  =  1/a; 

brow  =  b ( row) ; 
b  =  b  -  brow*Acol/a; 
b(row)  =  -brow/a; 

ccol  =  c  (col )  ; 
c  =  c  -  ccol*Arow/a; 
c  (col )  =  ccol/ a ; 

iter  =  iter+1; 

end 

In  this  code,  the  expression  max  ( c )  computes  the  maximum  value  of  the  elements 
of  the  vector  c.  It  returns  both  the  maximum  value  and  the  index  at  which  this  value 
was  attained  (the  first  such  index  if  there  are  more  than  one).  So,  c  j  is  the  maximal 
coefficient  and  col  is  the  index  at  which  this  maximum  is  attained.  Hence  col 
is  the  entering  column.  Given  the  matrix  A,  the  expression  A  (  :  ,  col)  denotes 
the  column  vector  consisting  of  the  elements  from  the  col  column  of  A.  Hence, 
Acol  denotes  the  column  of  the  dictionary  associated  with  the  entering  variable. 
The  next  lines  of  this  short  code  selects  the  leaving  variable,  which  is  in  the  row 
called  row.  Given  the  entering  column  and  the  leaving  row,  all  that  remains  is  to 
update  the  coefficients  in  the  objective  function,  the  right-hand  side,  and  the  array 
of  coefficients  A.  The  last  few  lines  encode  exactly  what  one  needs  to  do  to  carry 
out  a  pivot. 
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Figure  4.1.  Starting  from  a  primal  feasible  solution  and  using 
the  primal  simplex  method  to  pivot  to  an  optimal  solution,  shown 
here  is  a  log-log  plot  showing  the  number  of  pivots  required  to 
reach  optimality  (or  discover  that  the  problem  is  unbounded)  plot¬ 
ted  against  m  +  n.  Points  plotted  in  blue  correspond  to  problems 
having  an  optimal  solution  whereas  points  plotted  in  green  corre¬ 
spond  to  unbounded  problems. 


The  code  was  run  1,000  times  and  for  each  randomly  generated  problem  the 
values  of  m,  n,  and  the  number  of  iterations,  iter,  were  saved.  Figure  4.1  shows  a 
plot  of  the  number  of  pivots  plotted  against  the  sum  m+n.  Note  that  this  is  a  log-log 
plot.  That  is,  both  the  horizontal  and  vertical  axes  are  stretched  logarithmically. 

The  data  points  shown  in  blue  correspond  to  problems  where  an  optimal  solution 
was  obtained  whereas  those  shown  in  green  correspond  to  unbounded  problems. 
Clearly,  unboundedness  is  a  common  occurance  for  problems  generated  randomly. 
In  fact,  of  the  1,000  problems,  501  had  optimal  solutions  and  the  remaining  499 
were  unbounded.  These  numbers  suggest  that  the  probability  of  encountering  an 
unbounded  problem  is  exactly  one  half.  This  is  likely  true  but  it  has  not  been 
proven.  Further  investigation  reveals  that  instances  where  m  >  n  are  almost  never 
unbounded  whereas  the  preponderance  of  m  <  n  instances  are  unbounded.  Of 
course,  the  way  m  and  n  were  generated,  it  is  true  that  m  <  n  and  m  >  n  are 
equally  likely. 
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Figure  4.2.  The  same  data  as  shown  in  Figure  4.1  but  plotted 
against  n  instead  of  m  +  n. 


One  might  be  tempted  to  add  code  that  rounds  to  zero  any  right-hand  side  value 
that  is  within  a  small  tolerance  of  zero.  For  example,  we  could  add 

b  =  b . * (abs (b) >eps) ; 

just  before  the  line  defining  row.  Doing  so  forces  dictionaries  to  be  degenerate  and 
this  can  lead  to  cycling.  Experiments  show  that,  with  this  extra  line  of  code,  about  5 
out  of  1 ,000  instances  will  cycle  due  to  degeneracy.  Without  the  extra  line  of  code, 
no  instances  cycled. 

A  second  observation  is  that,  for  a  given  value  of  m  +  n,  there  seems  to  be  an 
effective  upper  limit  on  the  number  of  pivots  required.  Some  problems,  especially 
the  unbounded  ones  but  also  many  having  an  optimal  solution,  solve  in  many  fewer 
iterations. 

The  fact  that  some  problems  solved  quickly  even  when  m+n  was  large  suggests 
that  perhaps  m  +  n  is  not  the  best  measure  of  problem  size.  Figure  4.2  shows  the 
same  data  plotted  using  just  n  as  the  measure  of  problem  size.  Interestingly,  this 
change  dramatically  improves  the  correlation  between  size  and  number  of  iterations 
for  those  problems  that  arrived  at  an  optimal  solution.  But,  the  unbounded  problems 
are  still  spread  out  quite  a  bit. 

Upon  reflection,  it  seems  that  perhaps  a  problem  is  “easy”  if  either  m  or  n  is 
small  relative  to  the  other.  To  test  this  idea,  we  plot  the  number  of  iterations  against 
the  minimum  of  m  and  n.  This  plot  is  shown  in  Figure  4.3.  It  appears  like  we 
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Figure  4.3.  The  same  data  as  before  but  plotted  against  the  min¬ 
imum  of  m  and  n. 


have  finally  arrived  at  the  best  measure  of  size.  Performing  a  statistical  analysis  (see 
Chapter  12),  we  can  empirically  derive  the  straight  line  through  this  log-log  plot  that 
best  matches  the  data.  Separate  lines  for  the  blue  (optimal)  and  green  (unbounded) 
points  are  shown  on  the  graph.  For  the  blue  points,  the  equation  is  given  by 

logT  ~  —1.90  +  1.70  log(min(m,  n)) 

where  T  denotes  the  number  of  pivots  required  to  solve  a  problem.  Taking  the 
exponential  of  both  sides,  we  get 

T  «  e— i-90ei.70iog(min (m,n))  =  0.150 min(m,  n)1-70. 

For  the  green  (unbounded)  points,  the  equation  is 

T  «  0.180  min(m,  n)1'42. 

In  both  cases,  the  rate  of  growth  of  T  with  respect  to  min (ra,  n)  is  “superlinear”  as 
the  exponents  are  both  larger  than  1.  Figure  4.4  is  a  regular,  that  is  not  log-log,  plot 
of  the  number  of  simplex  pivots  versus  the  minimum  of  m  and  n.  This  plot  makes 
the  superlinearity  easy  to  spot. 

Finally,  a  careful  comparison  of  the  blue  (optimal  solution)  data  points  in 
Figures  4.2  and  4.3  reveals  that  they  are  almost  all  in  exactly  the  same  position 
in  the  two  plots.  The  reason  is  that  the  vast  majority  of  the  problems  that  had  an 
optimal  solution  were  problems  in  which  n  was  smaller  than  m. 
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Figure  4.4.  The  same  comparison  as  in  Figure  4.3  but  plot  lin¬ 
early  rather  than  log-log.  This  version  makes  clear  that  the  num¬ 
ber  of  pivots  grows  faster  than  linearly. 


Exercises 

In  solving  the  following  problems,  the  simple  pivot  tool  can  be  used  to  check 
your  arithmetic: 

www.  princeton .  edu/^rvdb/  JAVA/pivot/simple  .html 

4.1  Compare  the  performance  of  the  largest-coefficient  and  the  smallest-index 
pivoting  rules  on  the  following  linear  program: 

maximize  4xi  +  5^2 
subject  to  2xi  +  2x2  <  9 
xi  <4 

X2  <  3 
Xi,  X2  >  0  . 

4.2  Compare  the  performance  of  the  largest-coefficient  and  the  smallest-index 
pivoting  rules  on  the  following  linear  program: 

maximize  2xi  +  X2 
subject  to  3xi  +  X2  <  3 

Xi,  X2  >  0  . 
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4.3  Compare  the  performance  of  the  largest-coefficient  and  the  smallest-index 
pivoting  rules  on  the  following  linear  program: 

maximize  3xi  +  5x2 
subject  to  x\  +  2x2  <  5 
x\  <3 

X2  <  2 
Xi,  X2  >  0  . 

4.4  Solve  the  Klee-Minty  problem  (4.1)  for  n  =  3. 

4.5  Solve  the  four  variable  Klee-Minty  problem  using  the  online  pivot  tool: 

www.princeton.edu/^rvdb/JAVA/pivot/kleeminty.html 

4.6  Consider  the  dictionary 

C  =  - 

i— 1 

Wi  =  eiCjltf-i (bj  —  2 Xj)  +  (pi  —  i  =  1,  2, . . . ,  n, 

i=i 

where  the  6^’s  are  as  in  the  Klee-Minty  problem  (4.2)  and  where  each  e$ 
is  ±1.  Fix  k  and  consider  the  pivot  in  which  x^  enters  the  basis  and 
leaves  the  basis.  Show  that  the  resulting  dictionary  is  of  the  same  form  as 
before.  How  are  the  new  e$’s  related  to  the  old  e^’s? 


4.7  Use  the  result  of  the  previous  problem  to  show  that  the  Klee-Minty  prob¬ 
lem  (4.2)  requires  2n  —  1  iterations. 

4.8  Consider  the  Klee-Minty  problem  (4.2).  Suppose  that  bi  =  f3l~1  for  some 
(3  >  1.  Find  the  greatest  lower  bound  on  the  set  of  /?’  s  for  which  the  this 
problem  requires  2n  —  1  iterations. 


4.9  Show  that,  for  any  integer  n, 

—  22n  <  (  2n 
2  n  \  n 


<  22". 


4.10  Consider  a  linear  programming  problem  that  has  an  optimal  dictionary 
in  which  exactly  k  of  the  original  slack  variables  are  nonbasic.  Show 
that  by  ignoring  feasibility  preservation  of  intermediate  dictionaries 
this  dictionary  can  be  arrived  at  in  exactly  k  pivots.  Don’t  forget  to 
allow  for  the  fact  that  some  pivot  elements  might  be  zero.  Hint:  see 
Exercise  2.15. 


4.11  (matlab  required.)  Modify  the  matlab  code  posted  at 

www.princeton.edu/^rvdb/LPbook/complexity/primalsimplex.m 

so  that  data  elements  in  A,  b ,  and  c  are  not  rounded  off  to  integers.  Run 
the  code  and  compare  the  results  to  those  shown  in  Figure  4.3. 
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4.12  (matlab  required.)  Modify  the  matlab  code  posted  at 

www.princeton.edu/^rvdb/LPbook/complexity/primalsimplex.m 

so  that  the  output  is  a  log-log  plot  of  the  number  of  pivots  versus  the 
product  m  times  n.  Run  the  code  and  compare  the  results  to  those  shown 
in  Figure  4.3. 


Notes 

The  first  example  of  a  linear  programming  problem  in  n  variables  and  n  con¬ 
straints  taking  2n  —  1  iterations  to  solve  was  published  by  Klee  and  Minty  (1972). 
Several  researchers,  including  Smale  (1983),  Borgwardt  (1982,  1987a),  Adler  and 
Megiddo  (1985),  and  Todd  (1986),  have  studied  the  average  number  of  iterations. 
For  a  survey  of  probabilistic  methods,  the  reader  should  consult  Borgwardt  (1987b). 

Roughly  speaking,  a  class  of  problems  is  said  to  have  polynomial  complexity 
if  there  is  a  polynomial  p  for  which  every  problem  of  “size”  n  in  the  class  can  be 
solved  by  some  algorithm  in  at  most  p{n)  operations.  For  many  years  it  was  un¬ 
known  whether  linear  programming  had  polynomial  complexity.  The  Klee-Minty 
examples  show  that,  if  linear  programming  is  polynomial,  then  the  simplex  method 
is  not  the  algorithm  that  gives  the  polynomial  bound,  since  2n  is  not  dominated  by 
any  polynomial.  In  1979,  Khachian  gave  a  new  algorithm  for  linear  programming, 
called  the  ellipsoid  method ,  which  is  polynomial  and  therefore  established  once  and 
for  all  that  linear  programming  has  polynomial  complexity.  The  collection  of  all 
problem  classes  having  polynomial  complexity  is  usually  denoted  by  V.  A  class  of 
problems  is  said  to  belong  to  the  class  MV  if,  given  a  (proposed)  solution,  one  can 
verify  its  optimality  in  a  number  of  operations  that  is  bounded  by  some  polynomial 
in  the  “size”  of  the  problem.  Clearly,  V  C  MV  (since,  if  we  can  solve  from  scratch 
in  a  polynomial  amount  of  time,  surely  we  can  verify  optimality  at  least  that  fast). 
An  important  problem  in  theoretical  computer  science  is  to  determine  whether  or 
not  V  is  a  strict  subset  of  MV. 

The  study  of  how  difficult  it  is  to  solve  a  class  of  problems  is  called  complexity 
theory.  Readers  interested  in  pursuing  this  subject  further  should  consult  Garey  and 
Johnson  (1977). 


CHAPTER  5 


Duality  Theory 


Associated  with  every  linear  program  is  another  called  its  dual.  The  dual  of  this 
dual  linear  program  is  the  original  linear  program  (which  is  then  referred  to  as  the 
primal  linear  program).  Hence,  linear  programs  come  in  primal/dual  pairs.  It  turns 
out  that  every  feasible  solution  for  one  of  these  two  linear  programs  gives  a  bound 
on  the  optimal  objective  function  value  for  the  other.  These  ideas  are  important  and 
form  a  subject  called  duality  theory,  which  is  the  topic  of  this  chapter. 

1.  Motivation:  Finding  Upper  Bounds 

We  begin  with  an  example: 

maximize  4xi  +  X2  +  3x3 
subject  to  x\+  4^2  <  1 

3xi  —  X2  T  xs  <  3 
Xi,  X2,  X3  >  0. 

Our  first  observation  is  that  every  feasible  solution  provides  a  lower  bound  on  the 
optimal  objective  function  value,  £*.  For  example,  the  solution  (xi,X2,x3)  = 
(1, 0, 0)  tells  us  that  >  4.  Using  the  feasible  solution  (xi,  X2,  X3)  =  (0, 0,  3),  we 
see  that  (*  >  9.  But  how  good  is  this  bound?  Is  it  close  to  the  optimal  value?  To 
answer,  we  need  to  give  upper  bounds,  which  we  can  find  as  follows.  Let’s  multiply 
the  first  constraint  by  2  and  add  that  to  3  times  the  second  constraint: 

2  (xi  +  4x2  )  <  2(1) 

+3  (3xi  -  x2  +  x3)  <  3(3) 

llxi  +  5X2  +  3X3  <  11. 

Now,  since  each  variable  is  nonnegative,  we  can  compare  the  sum  against  the  ob¬ 
jective  function  and  notice  that 

4xi  +  X2  +  3X3  <  11X1+5X2+3X3  <  11. 

Hence,  <  11.  We  have  localized  the  search  to  somewhere  between  9  and  11. 
These  bounds  leave  a  gap  (within  which  the  optimal  solution  lies),  but  they  are  better 
than  nothing.  Furthermore,  they  can  be  improved.  To  get  a  better  upper  bound,  we 
again  apply  the  same  upper  bounding  technique,  but  we  replace  the  specific  numbers 
we  used  before  with  variables  and  then  try  to  find  the  values  of  those  variables 
that  give  us  the  best  upper  bound.  So  we  start  by  multiplying  the  two  constraints 
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by  nonnegative  numbers,  yi  and  y2,  respectively.  The  fact  that  these  numbers  are 
nonnegative  implies  that  they  preserve  the  direction  of  the  inequalities.  Hence, 

2/1  (  Xi  +  ix2  )  <  Vi 

_ +2/2  (  3^1  -  x2  +  X3)  < _ 3 2/2 

(yi  +  3y2)xi  +  (4t/i  -  y2)x2  +  (y2)x3  <  yx  +  3y2. 

If  we  stipulate  that  each  of  the  coefficients  of  the  o^’s  be  at  least  as  large  as  the 
corresponding  coefficient  in  the  objective  function, 

Vi  +  32/2  >  4 
42/1  -  2/2  >  1 
2/2  >  3  , 

then  we  can  compare  the  objective  function  against  this  sum  (and  its  bound): 

C  =  4a;  i  +  a;2  +  3x3 

<  (2/1  +  3y2)x!  +  (42/i  -  y2)x2  +  (2/2)373 

<  2/1+32/2- 

We  now  have  an  upper  bound,  2/1  +  32/2,  which  we  should  minimize  in  our  effort 
to  obtain  the  best  possible  upper  bound.  Therefore,  we  are  naturally  led  to  the 
following  optimization  problem: 


minimize 

yi 

+ 

32/2 

subject  to 

yi 

+ 

00 

*5 

to 

> 

4 

42/1 

— 

2/2 

> 

1 

2/2 

> 

3 

yi 

,  2/2 

> 

0 

This  problem  is  called  the  dual  linear  programming  problem  associated  with  the 
given  linear  programming  problem.  In  the  next  section,  we  will  define  the  dual 
linear  programming  problem  in  general. 


2.  The  Dual  Problem 

Given  a  linear  programming  problem  in  standard  form, 


maximize 

n 

cixi 

3  = 1 
n 

subject  to 

^  ^  &ij  Xj  + 

3  = 1 

*  =  1,2, 

Xj  >  0 


m 

n, 


j  —  1,2,..., 
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the  associated  dual  linear  program  is  given  by 

m 

i— 1 
m 

^  ^  Vi^ij  —  j  1,  2, . . . ,  n 

i— 1 

Hi  >  0  i  =  1,  2, . . . ,  m. 

Since  we  started  with  (5.1),  it  is  called  the  primal  problem .  Our  first  order  of 
business  is  to  show  that  taking  the  dual  of  the  dual  returns  us  to  the  primal.  To 
see  this,  we  first  must  write  the  dual  problem  in  standard  form.  That  is,  we  must 
change  the  minimization  into  a  maximization  and  we  must  change  the  first  set  of 
greater-than-or-equal-to  constraints  into  less-than-or-equal-to.  Of  course,  we  must 
effect  these  changes  without  altering  the  problem.  To  change  a  minimization  into  a 
maximization,  we  note  that  to  minimize  something  it  is  equivalent  to  maximize  its 
negative  and  then  negate  the  answer: 

m  /  m 

min  ^2  biyi  =  -max  I  -  ^  b,y, 

i—l  \  i=l 

To  change  the  direction  of  the  inequalities,  we  simply  multiply  through  by  minus 
one.  The  resulting  equivalent  representation  of  the  dual  problem  in  standard  form 
then  is 

m 

—maximize  y^(— bj)yj 

i—l 
m 

subject  to  $>*«)»  <  (-Cj) 

i—l 

Vi  >  o 

Now  we  can  take  its  dual: 


n 


—minimize 

3  = 1 
n 

subject  to 

^^{~aij)xj  >  {~bi) 

3  = 1 

i  =  1,  2, . . . ,  m 

Xj  >  0 

j  =  1,2, . . .  ,n, 

which  is  clearly  equivalent  to  the  primal  problem  as  formulated  in  (5.1). 

3.  The  Weak  Duality  Theorem 

As  we  saw  in  our  example,  the  dual  problem  provides  upper  bounds  for  the 
primal  objective  function  value.  This  result  is  true  in  general  and  is  referred  to  as 
the  Weak  Duality  Theorem : 


j  =  1,2,..., n 
i  —  1,  2, . . . ,  m. 


minimize 
subject  to 
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Primal  Values  Dual  Values 

- i - [ - 

Gap 


Primal  Values  Dual  Values 

ir 

JL 

A 


No  Gap 


Figure  5.1.  The  primal  objective  values  are  all  less  than  the  dual 
objective  values.  An  important  question  is  whether  or  not  there  is 
a  gap  between  the  largest  primal  value  and  the  smallest  dual  value. 


THEOREM  5.1.  If  (xi,X2, . . .  ,xn)  is  feasible  for  the  primal  and  ( yi,y2 , 
. . . ,  Dm)  is  feasible  for  the  dual,  then 

-  J2biVi- 

3  i 

PROOF.  The  proof  is  a  simple  chain  of  obvious  inequalities: 


<  hiVi > 
i 


where  the  first  inequality  follows  from  the  fact  that  each  xj  is  nonnegative  and  each 
Cj  is  no  larger  than  yiCLij.  The  second  inequality,  of  course,  holds  for  similar 
reasons.  □ 

Consider  the  subset  of  the  real  line  consisting  of  all  possible  values  for  the 
primal  objective  function,  and  consider  the  analogous  subset  associated  with  the 
dual  problem.  The  weak  duality  theorem  tells  us  that  the  set  of  primal  values  lies 
entirely  to  the  left  of  the  set  of  dual  values.  As  we  shall  see  shortly,  these  sets  are 
both  closed  intervals  (perhaps  of  infinite  extent),  and  the  right  endpoint  of  the  primal 
set  butts  up  against  the  left  endpoint  of  the  dual  set  (see  Figure  5.1).  That  is,  there  is 
no  gap  between  the  optimal  objective  function  value  for  the  primal  and  for  the  dual. 
The  lack  of  a  gap  between  primal  and  dual  objective  values  provides  a  convenient 
tool  for  verifying  optimality.  Indeed,  if  we  can  exhibit  a  feasible  primal  solution 
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(xfx ...,£*)  and  a  feasible  dual  solution  (?/* ,  •  •  • ,  y^)  for  which 

J2cix*j  =  J2biyi’ 

j  i 

then  we  may  conclude  that  each  of  these  solutions  is  optimal  for  its  respective  prob¬ 
lem.  To  see  that  the  primal  solution  is  optimal,  consider  any  other  feasible  solution 
(xi,  #2,  •  •  • ,  xn).  By  the  weak  duality  theorem,  we  have  that 

biyi  =  cix3  ■ 
j  i  j 

Now,  since  (x^x^  ...,#*)  was  assumed  to  be  feasible,  we  see  that  it  must  be 
optimal.  An  analogous  argument  shows  that  the  dual  solution  is  also  optimal  for 
the  dual  problem.  As  an  example,  consider  the  solutions  x  =  (0, 0.25,  3.25)  and 
y  =  (1,3)  in  our  example.  Both  these  solutions  are  feasible,  and  both  yield  an 
objective  value  of  10.  Hence,  the  weak  duality  theorem  says  that  these  solutions  are 
optimal. 


4.  The  Strong  Duality  Theorem 

The  fact  that  for  linear  programming  there  is  never  a  gap  between  the  primal 
and  the  dual  optimal  objective  values  is  usually  referred  to  as  the  Strong  Duality 
Theorem : 

THEOREM  5.2.  If  the  primal  problem  has  an  optimal  solution, 

_  (r* 

jy  yjj i ,  «^2 , 

then  the  dual  also  has  an  optimal  solution, 

V  \Vl  i  V2  5  •  •  •  5  Vm)  i 

such  that 

(5.2)  52  =!>!/?• 

3  i 

Carefully  written  proofs,  while  attractive  for  their  tightness,  sometimes  obfus¬ 
cate  the  main  idea.  In  such  cases,  it  is  better  to  illustrate  the  idea  with  a  simple 
example.  Anyone  who  has  taken  a  course  in  linear  algebra  probably  already  appre¬ 
ciates  such  a  statement.  In  any  case,  it  is  true  here  as  we  explain  the  strong  duality 
theorem. 

The  main  idea  that  we  wish  to  illustrate  here  is  that,  as  the  simplex  method 
solves  the  primal  problem,  it  also  implicitly  solves  the  dual  problem,  and  it  does  so 
in  such  a  way  that  (5.2)  holds. 

To  see  what  we  mean,  let  us  return  to  the  example  discussed  in  Section  5.1. 
We  start  by  introducing  variables  Wi,  i  =  1,  2,  for  the  primal  slacks  and  Zj ,  j  = 
1,  2,  3,  for  the  dual  slacks.  Since  the  inequality  constraints  in  the  dual  problem  are 
greater- than  constraints,  each  dual  slack  is  defined  as  a  left-hand  side  minus  the 
corresponding  right-hand  side.  For  example, 

zi=yi  +  3^/2  -  4. 


* 

,  xn 


), 
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Therefore,  the  primal  and  dual  dictionaries  are  written  as  follows: 

C  4gi  +  X2  +  3x3 

W  1  =  1—  Xi  —  4^2 
U)2  =  3  —  3.X‘1  +  X2  —  X‘3  . 

-£  =  -  V\  -  3^2 

2i  =  -4  +  yi  +  3y2 
z2  =  -1  +  4yi  -  y2 
Zz  =  -3  +  1)2  ■ 

Note  that  we  have  recorded  the  negative  of  the  dual  objective  function,  since  we 
prefer  to  maximize  the  objective  function  appearing  in  a  dictionary.  Also  note  that 
the  numbers  in  the  dual  dictionary  are  simply  the  negative  of  the  numbers  in  the  pri¬ 
mal  dictionary  arranged  with  the  rows  and  columns  interchanged.  Indeed,  stripping 
away  everything  but  the  numbers,  we  have 


(P) 


(D) 


0  -1  -3  ' 

0  4  13 

1-1-4  0 

neg.-transp. 
i - > 

-4  1  3 

3-3  1-1 

-1  4  -1 

-3  0  1 

That  is,  as  a  table  of  numbers,  the  dual  dictionary  is  the  negative  transpose  of  the 
primal  dictionary. 

Our  goal  now  is  to  apply  the  simplex  method  to  the  primal  problem  and  at  the 
same  time  perform  the  analogous  pivots  on  the  dual  problem.  We  shall  discover  that 
the  negative-transpose  property  persists  throughout  the  iterations. 

Since  the  primal  dictionary  is  feasible,  no  Phase  I  procedure  is  necessary.  For 
the  first  pivot,  we  pick  X3  as  the  entering  variable  (xi  has  the  largest  coefficient, 
but  X3  provides  the  greatest  one-step  increase  in  the  objective).  With  this  choice, 
the  leaving  variable  must  be  .  Since  the  rows  and  columns  are  interchanged  in 
the  dual  dictionary,  we  see  that  “column”  X3  in  the  primal  dictionary  corresponds 
to  “row”  Z3  in  the  dual  dictionary.  Similarly,  row  w 2  in  the  primal  corresponds  to 
column  y2  in  the  dual.  Hence,  to  make  an  analogous  pivot  in  the  dual  dictionary,  we 
select  y2  as  the  entering  variable  and  Z3  as  the  leaving  variable.  While  this  choice  of 
entering  and  leaving  variable  may  seem  odd  compared  to  how  we  have  chosen  enter¬ 
ing  and  leaving  variables  before,  we  should  note  that  our  earlier  choice  was  guided 
by  the  desire  to  increase  the  objective  function  while  preserving  feasibility.  Here, 
the  dual  dictionary  is  not  even  feasible,  and  so  such  considerations  are  meaningless. 
Once  we  give  up  those  rules  for  the  choice  of  entering  and  leaving  variables,  it  is 
easy  to  see  that  a  pivot  can  be  performed  with  any  choice  of  entering  and  leaving 
variables  provided  only  that  the  coefficient  on  the  entering  variable  in  the  constraint 
of  the  leaving  variables  does  not  vanish.  Such  is  the  case  with  the  current  choice. 
Hence,  we  do  the  pivot  in  both  the  primal  and  the  dual.  The  result  is 
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(  =  9  —  5xi  +  4^2  —  3vj2 
W  1  =  1—  X\  —  4X2 
X3  =  3  —  3xi  +  X2  —  ^2  • 

-g  =  -9  -  2/i  -  4^3 

zi  =  5  +  2/1  +  3z3 

^2  =  -4  +  42/1  -  z3 
2/2  =  3  +  z3  . 

Note  that  these  two  dictionaries  still  have  the  property  of  being  negative- transposes 
of  each  other.  For  the  next  pivot,  the  entering  variable  in  the  primal  dictionary  is  X2 
(this  time  there  is  no  choice)  and  the  leaving  variable  is  w\.  In  the  dual  dictionary, 
the  corresponding  entering  variable  is  yi  and  the  leaving  variable  is  ^2 .  Doing  the 
pivots,  we  get 

£  =  10  —  6xi  —  iv  i  —  3w2 

(P)  x2  =  0.25  -  0.25xi  -  0.25^! 

x3  =  3.25  —  3.25xi  —  0.25u;i  —  . 

-j  =  -10  -  0.25z2  ~  3.25z3 
z\  =  6  +  0.25^2  +  3.25z3 

(  j  2/i  =  1  +  0.25z2  +  0.25z3 

1/2=3  +  z3  . 

This  primal  dictionary  is  optimal,  since  the  coefficients  in  the  objective  row  are  all 
negative.  Looking  at  the  dual  dictionary,  we  see  that  it  is  now  feasible  for  the  anal¬ 
ogous  reason.  In  fact,  it  is  optimal  too.  Finally,  both  the  primal  and  dual  objective 
function  values  are  10. 

The  situation  should  now  be  clear.  Given  a  linear  programming  problem,  which 
is  assumed  to  possess  an  optimal  solution,  first  apply  the  Phase  I  procedure  to  get 
a  basic  feasible  starting  dictionary  for  Phase  II.  Then  apply  the  simplex  method  to 
find  an  optimal  solution.  Each  primal  dictionary  generated  by  the  simplex  method 
implicitly  defines  a  corresponding  dual  dictionary  as  follows:  first  write  down  the 
negative  transpose  and  then  replace  each  Xj  with  a  zj  and  each  with  a  y^.  As  long 
as  the  primal  dictionary  is  not  optimal,  the  implicitly  defined  dual  dictionary  will  be 
infeasible.  But  once  an  optimal  primal  dictionary  is  found,  the  corresponding  dual 
dictionary  will  be  feasible.  Since  its  objective  coefficients  are  always  nonpositive, 
this  feasible  dual  dictionary  is  also  optimal.  Furthermore,  at  each  iteration,  the 
current  primal  objective  function  value  coincides  with  the  current  dual  objective 
function  value. 

To  see  why  the  negative  transpose  property  is  preserved  from  one  dictionary 
to  the  next,  let’s  observe  the  effect  of  one  pivot.  To  keep  notations  uncluttered,  we 
consider  only  four  generic  entries  in  a  table  of  coefficients:  the  pivot  element,  which 
we  denote  by  a,  one  other  element  on  the  pivot  element’s  row,  call  it  6,  one  other 
in  its  column,  call  it  c,  and  a  fourth  element,  denoted  d,  chosen  to  make  these  four 
entries  into  a  rectangle.  A  little  thought  (and  perhaps  some  staring  at  the  examples 
above)  reveals  that  a  pivot  produces  the  following  changes: 


(P) 


(D) 
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•  The  pivot  element  gets  replaced  by  its  reciprocal; 

•  Elements  in  the  pivot  row  get  negated  and  divided  by  the  pivot  element; 

•  Elements  in  the  pivot  column  get  divided  by  the  pivot  element;  and 

•  All  other  elements,  such  as  d,  get  decreased  by  be/ a. 

These  effects  can  be  summarized  on  our  generic  table  as  follows: 


b 

a 

d 

c 

pivot 
- > 


Now,  if  we  start  with  a  dual  dictionary  that  is  the  negative  transpose  of  the  primal 
and  apply  one  pivot  operation,  we  get 


-b 

-d 

—a 

—c 

b 

a 

7  be 
— d  H-  — 
a 

1 

a 

c 

a 

Note  that  the  resulting  dual  table  is  the  negative  transpose  of  the  resulting  primal 
table.  By  induction  we  then  conclude  that,  if  we  start  with  this  property,  it  will  be 
preserved  throughout  the  solution  process. 

Since  the  strong  duality  theorem  is  the  most  important  theorem  in  this  book, 
we  present  here  a  careful  proof.  Those  readers  who  are  satisfied  with  the  above 
discussion  may  skip  the  proof. 


Proof  of  Theorem  5.2.  It  suffices  to  exhibit  a  dual  feasible  solution  y*  sat¬ 
isfying  (5.2).  Suppose  we  apply  the  simplex  method.  We  know  that  the  simplex 
method  produces  an  optimal  solution  whenever  one  exists,  and  we  have  assumed 
that  one  does  indeed  exist.  Hence,  the  final  dictionary  will  be  an  optimal  dictionary 
for  the  primal  problem.  The  objective  function  in  this  final  dictionary  is  ordinarily 
written  as 

c  =  c+£  cjXj. 

jeM 

But,  since  this  is  the  optimal  dictionary  and  we  prefer  stars  to  bars  for  denoting 
optimal  “stuff,”  let  us  write  (*  instead  of  (.  Also,  the  collection  of  nonbasic  vari¬ 
ables  will  generally  consist  of  a  combination  of  original  variables  as  well  as  slack 
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variables.  Instead  of  using  cj  for  the  coefficients  of  these  variables,  let  us  use  c*  for 
the  objective  coefficients  corresponding  to  original  variables,  and  let  us  use  d*  for 
the  objective  coefficients  corresponding  to  slack  variables.  Also,  for  those  original 
variables  that  are  basic  we  put  c*  =  0,  and  for  those  slack  variables  that  are  basic 
we  put  d*  =  0.  With  these  new  notations,  we  can  rewrite  the  objective  function  as 

n  m 

C  =  C*  +  c*jxj  +  d*Wi. 

.7  =  1  2  =  1 


As  we  know,  (*  is  the  objective  function  value  corresponding  to  the  optimal  primal 
solution: 

n 

(5.3)  C=Y,cix*r 

3  =  1 

Now,  put 


(5.4) 


Vi  d^ ,  i  1,2,...,  tyi  . 


We  shall  show  that  y*  =  (y* ,  y% , . . . ,  y^)  is  feasible  for  the  dual  problem  and 
satisfies  (5.2).  To  this  end,  we  write  the  objective  function  two  ways: 


n 


n 


m 


E 

.7  =  1 


GjOCj 


c + ^  c*xj + d 

3  = 1  2=1 


7^7 


n 


m 


n 


c + E  cjxj  +  E(-y*)  -  E 

i=i  2=1  \  j= l 


O'ij  xj 


m 


7= 1 
n 


m 


c-E«  +  E  ci+E» 


7  a2j  I  xj  • 


7=1 


7  =  1 


7=1 


Since  all  these  expressions  are  linear  in  the  variables  Xj ,  we  can  equate  the  coeffi¬ 
cients  of  each  variable  appearing  on  the  left-hand  side  with  the  corresponding  coef¬ 
ficient  appearing  in  the  last  expression  on  the  right-hand  side.  We  can  also  equate 
the  constant  terms  on  the  two  sides.  Hence, 


m 


(5.5) 

c*  =  E  b*y* 

7=1 

m 

(5.6) 

*  i  \  ^  * 

Cj  —  Cj  +  2-^ 
7=1 

Combining  (5.3)  and  (5.5),  we  get  that  (5.2)  holds.  Also,  the  optimality  of  the 
dictionary  for  the  primal  problem  implies  that  each  c*  is  nonpositive,  and  hence  we 
see  from  (5.6)  that 
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By  the  same  reasoning,  each  d *  is  nonpositive,  and  so  we  see  from  (5.4)  that 

y*  >  0,  i  =  1,  2, . . . ,  m. 

These  last  two  sets  of  inequalities  are  precisely  the  conditions  that  guarantee  dual 
feasibility.  This  completes  the  proof.  □ 

The  strong  duality  theorem  tells  us  that,  whenever  the  primal  problem  has  an 
optimal  solution,  the  dual  problem  has  one  also  and  there  is  no  duality  gap.  But  what 
if  the  primal  problem  does  not  have  an  optimal  solution?  For  example,  suppose  that 
it  is  unbounded.  The  unboundedness  of  the  primal  together  with  the  weak  duality 
theorem  tells  us  immediately  that  the  dual  problem  must  be  infeasible.  Similarly, 
if  the  dual  problem  is  unbounded,  then  the  primal  problem  must  be  infeasible.  It 
is  natural  to  hope  that  these  three  cases  are  the  only  possibilities,  because  if  they 
were  we  could  then  think  of  the  strong  duality  theorem  holding  globally.  That  is, 
even  if,  say,  the  primal  is  unbounded,  the  fact  that  then  the  dual  is  infeasible  is  like 
saying  that  the  primal  and  dual  have  a  zero  duality  gap  sitting  out  at  +oo.  Similarly, 
an  infeasible  primal  together  with  an  unbounded  dual  could  be  viewed  as  a  pair  in 
which  the  gap  is  zero  and  sits  at  —  oo. 

But  it  turns  out  that  there  is  a  fourth  possibility  that  sometimes  occurs — it  can 
happen  that  both  the  primal  and  the  dual  problems  are  infeasible.  For  example, 
consider  the  following  problem: 


maximize 

2x\  —  X2 

subject  to 

X\  —  X2 

< 

1 

—Xi  +  X2 

< 

-2 

X\,  x2 

> 

0 

It  is  easy  to  see  that  both  this  problem  and  its  dual  are  infeasible.  For  these  problems, 
one  can  think  of  there  being  a  huge  duality  gap  extending  from  — oo  to  +oo. 

Duality  theory  is  often  useful  in  that  it  provides  a  certificate  of  optimality .  For 
example,  suppose  that  you  were  asked  to  solve  a  really  huge  and  difficult  linear 
program.  After  spending  weeks  or  months  at  the  computer,  you  are  finally  able 
to  get  the  simplex  method  to  solve  the  problem,  producing  as  it  does  an  optimal 
dual  solution  y*  in  addition  to  the  optimal  primal  solution  x* .  Now,  how  are  you 
going  to  convince  your  boss  that  your  solution  is  correct?  Do  you  really  want  to  ask 
her  to  verify  the  correctness  of  your  computer  programs?  The  answer  is  probably 
not.  And  in  fact  it  is  not  necessary.  All  you  need  to  do  is  supply  the  primal  and 
the  dual  solution,  and  she  only  has  to  check  that  the  primal  solution  is  feasible  for 
the  primal  problem  (that’s  easy),  the  dual  solution  is  feasible  for  the  dual  problem 
(that’s  just  as  easy),  and  the  primal  and  dual  objective  values  agree  (and  that’s  even 
easier).  Certificates  of  optimality  have  also  been  known  to  dramatically  reduce  the 
amount  of  time  certain  underpaid  professors  have  to  devote  to  grading  homework 
assignments ! 

As  we’ve  seen,  the  simplex  method  applied  to  a  primal  problem  actually  solves 
both  the  primal  and  the  dual.  Since  the  dual  of  the  dual  is  the  primal,  applying 
the  simplex  method  to  the  dual  also  solves  both  the  primal  and  the  dual  problem. 
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Sometimes  it  is  easier  to  apply  the  simplex  method  to  the  dual,  for  example,  if  the 
dual  has  an  obvious  basic  feasible  solution  but  the  primal  does  not.  We  take  up  this 
topic  in  the  next  chapter. 


5.  Complementary  Slackness 

Sometimes  it  is  necessary  to  recover  an  optimal  dual  solution  when  only  an 
optimal  primal  solution  is  known.  The  following  theorem,  known  as  the  Comple¬ 
mentary  Slackness  Theorem ,  can  help  in  this  regard. 

Theorem  5.3.  Suppose  that  x  =  (aq,  . . . ,  xn)  is  primal  feasible  and  that 
y  =  (2/1, 2/2?  •  •  •  ?  Um)  is  dual  feasible.  Let  (iei,  W2, ... ,  wm)  denote  the  correspond¬ 
ing  primal  slack  variables,  and  let  (zi,  Z2,  •  •  • ,  zn)  denote  the  corresponding  dual 
slack  variables.  Then  x  and  y  are  optimal  for  their  respective  problems  if  and  only  if 

XjZj  =  0,  for  j  =  1,  2, . . . ,  n, 

(5.7)  =  0,  fori  =  1,2, . . .  ,ra. 

PROOF.  We  begin  by  revisiting  the  chain  of  inequalities  used  to  prove  the  weak 
duality  theorem: 

(5.8)  53 c 

3 


(5.9) 

Recall  that  the  first  inequality  arises  from  the  fact  that  each  term  in  the  left-hand 
sum  is  dominated  by  the  corresponding  term  in  the  right-hand  sum.  Furthermore, 
this  domination  is  a  consequence  of  the  fact  that  each  xj  is  nonnegative  and 

Cj  —  ^  ^  Vi&ij  • 
i 

Hence,  inequality  (5.8)  will  be  an  equality  if  and  only  if,  for  every  j  =  1,  2, . . . ,  n, 
either  Xj  =  0  or  cj  =  yiaij.  But  since 


we  see  that  the  alternative  to  Xj  =  0  is  simply  that  Zj  =0.  Of  course,  the  state¬ 
ment  that  at  least  one  of  these  two  numbers  vanishes  can  be  succinctly  expressed  by 
saying  that  the  product  vanishes. 

An  analogous  analysis  of  inequality  (5.9)  shows  that  it  is  an  equality  if  and  only 
if  (5.7)  holds.  This  then  completes  the  proof.  □ 

Suppose  that  we  have  a  nondegenerate  primal  basic  optimal  solution 

rn*  /  rr*  rr*  rr*  \ 

tAy  y  tAy  *Ay  ^  7***5  T'  ^  J 


ixj 


—  ^2  yiaijj 


Xj 


y!  (  aijx3  )  hi 


< 


i  \  3 
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and  we  wish  to  find  a  corresponding  optimal  solution  for  the  dual.  Let 

W*  =  {w{,W2,  ■  ■  ■  ,w*m) 


denote  the  corresponding  slack  variables,  which  were  probably  given  along  with  the 
Xj  ’s  but  if  not  can  be  easily  obtained  from  their  definition  as  slack  variables: 

W*  =  bi  —  CLijX*. 

j 


The  dual  constraints  are 


(5.10)  -  zj  =  cj,  j  =  1,2, ...  ,n, 

i 

where  we  have  written  the  inequalities  in  equality  form  by  introducing  slack  vari¬ 
ables  zj,  j  =  1,  2, . . . ,  n.  These  constraints  form  n  equations  in  m  +  n  unknowns. 
But  the  basic  optimal  solution  is  a  collection  of  n  +  m  variables,  many  of 

which  are  positive.  In  fact,  since  the  primal  solution  is  assumed  to  be  nondegenerate, 
it  follows  that  the  m  basic  variables  will  be  strictly  positive.  The  complementary 
slackness  theorem  then  tells  us  that  the  corresponding  dual  variables  must  vanish. 
Hence,  of  the  m  +  n  variables  in  (5.10),  we  can  set  m  of  them  to  zero.  We  are  then 
left  with  just  n  equations  in  n  unknowns,  which  we  would  expect  to  have  a  unique 
solution  that  can  be  solved  for.  If  there  is  a  unique  solution,  all  the  components 
should  be  nonnegative.  If  any  are  negative,  this  would  stand  in  contradiction  to  the 
assumed  optimality  of  x*. 


6.  The  Dual  Simplex  Method 

In  this  section,  we  study  what  happens  if  we  apply  the  simplex  method  to  the 
dual  problem.  As  we  saw  in  our  discussion  of  the  strong  duality  theorem,  one  can 
actually  apply  the  simplex  method  to  the  dual  problem  without  ever  writing  down 
the  dual  problem  or  its  dictionaries.  Instead,  the  so-called  dual  simplex  method  is 
seen  simply  as  a  new  way  of  picking  the  entering  and  leaving  variables  in  a  sequence 
of  primal  dictionaries. 

We  begin  with  an  example: 


maximize 

—  X\ 

— 

X2 

subject  to 

—2x\ 

— 

x2 

< 

4 

—2x\ 

+ 

4x2 

< 

-8 

—  X\ 

+ 

3x2 

< 

-7 

Xl, 

x2 

> 

0 

The  dual  of  this  problem  is 

minimize  4?/i  —  81/2  —  7ys 
subject  to  —2yi  -  2y2  -  2/3  >  -1 
~V\  +  42/2  +  3t/3  >  -1 
2/i,  2/2,  2/3  >  0  . 
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Introducing  variables  W{,  i  =  1,  2,  3,  for  the  primal  slacks  and  zj,  j  =  1,2,  for  the 
dual  slacks,  we  can  write  down  the  initial  primal  and  dual  dictionaries: 

(P)  C  =  -  Xi  -  X2 

wi  =  4  +  2xi  +  X2 

iv  2  =  —8  +  2xi  —  4^2 
IV  3  =  —7  +  X\  —  3X2 

(D)  -g  =  -  4 |/i  +  %2  +  7ya 

zi  =  1  -  2yi  -  2y2  -  2/3 
z2  =  1  -  2/1  +  4j/2  +  31/3  • 

As  before,  we  have  recorded  the  negative  of  the  dual  objective  function,  since  we 
prefer  to  maximize  the  objective  function  appearing  in  a  dictionary.  More  impor¬ 
tantly,  note  that  the  dual  dictionary  is  feasible,  whereas  the  primal  one  is  not.  This 
suggests  that  it  would  be  sensible  to  apply  the  simplex  method  to  the  dual.  Let  us 
do  so,  but  as  we  go  we  keep  track  of  the  analogous  pivots  applied  to  the  primal 
dictionary.  For  example,  the  entering  variable  in  the  initial  dual  dictionary  is  7/2, 
and  the  leaving  variable  then  is  z\.  Since  W2  is  complementary  to  7/2  and  x\  is 
complementary  to  z\,  we  will  use  W2  and  x\  as  the  entering/leaving  variables  in  the 
primal  dictionary.  Of  course,  since  W2  is  basic  and  x\  is  nonbasic,  W2  must  be  the 
leaving  variable  and  x\  the  entering  variable — i.e.,  the  reverse  of  what  we  have  for 
the  complementary  variables  in  the  dual  dictionary.  The  result  of  these  pivots  is 

(P)  C  =  -4  -  0.5 W2  -  3X2 

W\  =  12  +  W2  +  5X2 
X\  4  T-  0.3w2  T  2x2 
W3  =  —  3  +  0.5'W2  —  X‘2 

(D)  -g  =  4  -  I2yi  -  4gi  +  3y3 

y2  =  0.5  -  yi  -  0.52i  -  0.5 y3 

z2  =  3  -  by!  -  2z\  -I-  y3  • 

Continuing  to  work  on  the  dual,  we  now  see  that  y3  is  the  entering  variable  and  7/2 
leaves.  Hence,  for  the  primal  we  use  w3  and  W2  as  the  leaving  and  entering  variable, 
respectively.  After  pivoting,  we  have 

(P)  C  =  -7  ~  Ws-  4x2 

w\  =  18  +  2  re  3  +  7x2 

xi  =  7  +  w3  +  3x2 

W2  6  +  21C3  +  2x2 

(D)  -g  =  7  -  lSyi  -  7zi  -  6y2 

2/3  =  1—  2y!  -  21-  2y2 
z2=  4  -  7^/i  -  3z!  -  2y2  . 

Now  we  notice  that  both  dictionaries  are  optimal. 

Of  course,  in  each  of  the  above  dictionaries,  the  table  of  numbers  in  each  dual 
dictionary  is  the  negative- transpose  of  the  corresponding  primal  table.  Therefore, 
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we  never  need  to  write  the  dual  dictionary;  the  dual  simplex  method  can  be  entirely 
described  in  terms  of  the  primal  dictionaries.  Indeed,  first  we  note  that  the  dictionary 
must  be  dual  feasible.  This  means  that  all  the  coefficients  of  the  nonbasic  variables 
in  the  primal  objective  function  must  be  nonpositive.  Given  this,  we  proceed  as 
follows.  First  we  select  the  leaving  variable  by  picking  that  basic  variable  whose 
constant  term  in  the  dictionary  is  the  most  negative  (if  there  are  none,  then  the 
current  dictionary  is  optimal).  Then  we  pick  the  entering  variable  by  scanning  across 
this  row  of  the  dictionary  and  comparing  ratios  of  the  coefficients  in  this  row  to  the 
corresponding  coefficients  in  the  objective  row,  looking  for  the  largest  negated  ratio 
just  as  we  did  in  the  primal  simplex  method.  Once  the  entering  and  leaving  variable 
are  identified,  we  pivot  to  the  next  dictionary  and  continue  from  there.  The  reader  is 
encouraged  to  trace  the  pivots  in  the  above  example,  paying  particular  attention  to 
how  one  determines  the  entering  and  leaving  variables  by  looking  only  at  the  primal 
dictionary. 


7.  A  Dual-Based  Phase  I  Algorithm 

The  dual  simplex  method  described  in  the  previous  section  provides  us  with  a 
new  Phase  I  algorithm,  which  if  nothing  else  is  at  least  more  elegant  than  the  one 
we  gave  in  Chapter  2.  Let  us  illustrate  it  using  an  example: 

maximize  —x\  +  4^2 
subject  to  — 2xi  —  <  4 

— 2xi  +  4x2  <  —8 
— x\  +  3x2  <  —7 
Xi,  X2  >  0  . 

The  primal  dictionary  for  this  problem  is 

(P)  C  —  —  Xi  +  4x2 

w  i  =  4  +  2xi  +  X2 
W2  =  —  8  +  2xi  —  4x2 
ws  =  —  7  +  x\  —  3x2  , 

and  even  though  at  this  point  we  realize  that  we  don’t  need  to  look  at  the  dual 
dictionary,  let’s  track  it  anyway: 

(D)  ~£  =  -  %1  +  8y2  +  ly3 

zi=  1  -  2yi  -  2 y2  -  y3 
z2  =  -4  -  yi  +  4y2  +  3 y3  ■ 

Clearly,  neither  the  primal  nor  the  dual  dictionary  is  feasible.  But  by  changing  the 
primal  objective  function,  we  can  easily  produce  a  dual  feasible  dictionary.  For 
example,  let  us  temporarily  change  the  primal  objective  function  to 

rj  =  —x\  —  X2- 

Then  the  corresponding  initial  dual  dictionary  is  feasible.  In  fact,  it  coincides  with 
the  dual  dictionary  we  considered  in  the  previous  section,  so  we  already  know  the 
optimal  solution  for  this  modified  problem.  The  optimal  primal  dictionary  is 
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Tj  =  —7  —  Ws  —  4x‘2 

iv  i  =  18  +  27/;  3  +  7x2 
Xi  =  7  +  7^3  +  3X2 

W2  6  +  2 7/; 3  +  2x2  • 

This  primal  dictionary  is  optimal  for  the  modified  problem  but  not  for  the  original 
problem.  However,  it  is  feasible  for  the  original  problem,  and  we  can  now  simply 
reinstate  the  intended  objective  function  and  continue  with  Phase  II.  Indeed, 

C  —Xl  +  4x2 

—  (7  -T  77/3  -f-  3x2)  +  4x2 
=  -7  -  77/3  +  X2. 

Hence,  the  starting  dictionary  for  Phase  II  is 

C  =  -7  -  w3  +  X2 

77/1  =  18  +  2 7/73  +  7X2 

xi  =  7  +  7/73  +  3x2 

7772  6  +  27773  +  2X2  • 

The  entering  variable  is  X2.  Looking  for  a  leaving  variable,  we  discover  that  this 
problem  is  unbounded.  Of  course,  more  typically  one  would  expect  to  have  to  do 
several  iterations  of  Phase  II  to  find  the  optimal  solution  (or  show  unboundedness). 
Here  we  just  got  lucky  that  the  game  ended  so  soon. 

It  is  interesting  to  note  how  we  detect  infeasibility  with  this  new  Phase  I  algo¬ 
rithm.  The  modified  problem  is  guaranteed  always  to  be  dual  feasible.  It  is  easy  to 
see  that  the  primal  problem  is  infeasible  if  and  only  if  the  modified  problem  is  dual 
unbounded  (which  the  dual  simplex  method  will  detect  just  as  the  primal  simplex 
method  detects  primal  unboundedness). 

The  two-phase  algorithm  we  have  just  presented  can  be  thought  of  as  a  dual- 
primal  algorithm,  since  we  first  apply  the  dual  simplex  method  to  a  modified  dual 
feasible  problem  and  then  finish  off  by  applying  the  primal  simplex  method  to  the 
original  problem,  starting  from  the  feasible  dictionary  produced  by  Phase  I.  One 
could  consider  turning  this  around  and  doing  a  primal-dual  two-phase  algorithm. 
Here,  the  right-hand  side  of  the  primal  problem  would  be  modified  to  produce  an 
obvious  primal  feasible  solution.  The  primal  simplex  method  would  then  be  applied. 
The  optimal  solution  to  this  primal  problem  will  then  be  feasible  for  the  original 
dual  problem  but  will  not  be  optimal  for  it.  But  then  the  dual  simplex  method  can 
be  applied,  starting  with  this  dual  feasible  basis  until  an  optimal  solution  for  the  dual 
problem  is  obtained. 


8.  The  Dual  of  a  Problem  in  General  Form 

In  Chapter  1 ,  we  saw  that  linear  programming  problems  can  be  formulated  in 
a  variety  of  ways.  In  this  section,  we  derive  the  form  of  the  dual  when  the  primal 
problem  is  not  necessarily  presented  in  standard  form. 
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First,  let  us  consider  the  case  where  the  linear  constraints  are  equalities  (and  the 


variables 

are  nonnegative): 

maximize 

n 

cjx3 

.7  =  1 

(5.11) 

subject  to 

n 

^  j = bi 

i  =  1,  2, . . . ,  m 

3  =  1 

Xj  >  0 

As  we  mentioned  in  Chapter  1,  this  problem  can  be  reformulated  with  inequality 
constraints  by  simply  writing  each  equality  as  two  inequalities:  one  greater- than- or- 
equal-to  and  one  less-than-or-equal-to: 

n 

maximize  E  CjXj 

3= 1 
n 

subject  to  aijxj  E  bi  i  =  1,  2, . . . ,  m 

3  = 1 
n 

CLijXj  >  bi  i  =  1,2, ... ,  ra 

i=i 

Tj  >  0  j  =  1,  2, . . . ,  n. 

Then  negating  each  greater-than-or-equal-to  constraint,  we  can  put  the  problem  into 
standard  form: 


n 

maximize  c3  x3 

3  = 1 

n 

subject  to  E  —  bi 

3  = 1 
n 

^  ^  E  bi 

.7  =  1 

Xj  >  0 


z  =  1,  2, . . . ,  m 

i  =  1,  2, . . . ,  m 
j  = 


Now  that  the  problem  is  in  standard  form,  we  can  write  down  its  dual.  Since  there 
are  two  sets  of  m  inequality  constraints,  we  need  two  sets  of  m  dual  variables. 
Let’s  denote  the  dual  variables  associated  with  the  first  set  of  m  constraints  by  yl , 
i  =  1,  2, . . . ,  m,  and  the  remaining  dual  variables  by  y~ ,  i  =  1,  2, . . . ,  m.  With 
these  notations,  the  dual  problem  is 


minimize 


E  y?  ~  E  yi  aii  -  c3 

i— 1  i=l 

2/*+  >  Vi  >  0 


j  =  1,2, . . .  ,n 
i  =  1,  2, . . . ,  m. 


subject  to 
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Primal 

Dual 

Equality  constraint 

Free  variable 

Inequality  constraint 

Nonnegative  variable 

Free  variable 

Equality  constraint 

Nonnegative  variable 

Inequality  constraint 

Table  5.1.  Rules  for  forming  the  dual. 


A  moment’s  reflection  reveals  that  we  can  simplify  this  problem.  If  we  put 

Vi  Vi  Vi  i  ^  ^5  •  •  •  5 

the  dual  problem  reduces  to 

m 

minimize  E  hyi 

i— 1 

m 

subject  to  E  Vi&ij  E  Cj  j  —  1,  2,  .  .  .  5  71. 

i— 1 

This  problem  is  dual  associated  with  (5.11).  Note  what  has  changed  from  when 
we  were  considering  problems  in  standard  form:  now  the  dual  variables  are  not 
restricted  to  be  nonnegative.  And  that  is  the  message:  equality  constraints  in  the 
primal  yield  unconstrained  variables  ( also  referred  to  as  free  variables )  in  the  dual, 
whereas  inequality  constraints  in  the  primal  yield  nonnegative  variables  in  the  dual. 
Employing  the  symmetry  between  the  primal  and  the  dual,  we  can  say  more:  free 
variables  in  the  primal  yield  equality  constraints  in  the  dual,  whereas  nonnegative 
variables  in  the  primal  yield  inequality  constraints  in  the  dual.  These  rules  are 
summarized  in  Table  5.1. 

9.  Resource  Allocation  Problems 

Let  us  return  to  the  production  facility  problem  studied  in  Chapter  1.  Recall 
that  this  problem  involves  a  production  facility  that  can  take  a  variety  of  raw  ma¬ 
terials  (enumerated  i  =  1,  2, . . . ,  m)  and  turn  them  into  a  variety  of  final  products 
(enumerated  j  =  1,  2, . . . ,  n).  We  assume  as  before  that  the  current  market  value  of 
a  unit  of  the  it h  raw  material  is  pi,  that  the  current  market  price  for  a  unit  of  the  jth 
product  is  aj ,  that  producing  one  unit  of  product  j  requires  units  of  raw  material 
i,  and  that  at  the  current  moment  in  time  the  facility  has  on  hand  bi  units  of  the  Ah 
raw  material. 

The  current  market  values/prices  are,  by  definition,  related  to  each  other  by  the 
formulas 

cr  j  —  'y  ^  Pi&ij  5  j  —  1,  2, . . . ,  n. 

i 

These  equations  hold  whenever  the  market  is  in  equilibrium.  (Of  course,  it  is 
crucial  to  assume  here  that  the  collection  of  “raw  materials”  appearing  on  the  right- 
hand  side  is  exhaustive,  including  such  items  as  depreciation  of  fixed  assets  and 
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physical  labor.)  In  the  real  world,  the  market  is  always  essentially  in  equilibrium. 
Nonetheless,  it  continually  experiences  small  perturbations  that  ripple  through  it  and 
move  the  equilibrium  to  new  levels. 

These  perturbations  can  be  from  several  causes,  an  important  one  being  inno¬ 
vation.  One  possible  innovation  is  to  improve  the  production  process.  This  means 
that  the  values  of  some  of  the  a^-’s  are  reduced.  Now,  suddenly  there  is  a  windfall 
profit  for  each  unit  of  product  j  produced.  This  windfall  profit  is  given  by 


(5.12) 


<7 


3 


T,p 


i&ij 


i 


Of  course,  eventually  most  producers  of  these  products  will  take  advantage  of  the 
same  innovation,  and  once  the  suppliers  get  wind  of  the  profits  being  made,  they 
will  get  in  on  the  action  by  raising  the  price  of  the  raw  materials.  Nonetheless, 
there  is  always  a  time  lag;  it  is  during  this  time  that  fortunes  are  made. 

To  be  concrete,  let  us  assume  that  the  time  lag  is  about  1  month  (depending  on 
the  industry,  this  lag  time  could  be  considered  too  short  or  too  long).  Suppose  also 
that  the  production  manager  decides  to  produce  x3  units  of  product  j  and  that  all 
units  produced  are  sold  immediately  at  their  market  value.  Then  the  total  revenue 
during  this  month  will  be  JN  (JjXj.  The  value  of  the  raw  materials  on  hand  at  the 
beginning  of  the  month  was  jA  pib{.  Also,  if  we  denote  the  new  price  levels  for  the 
raw  materials  at  the  end  of  the  month  by  Wi,  i  =  1,  2, . . . ,  m,  then  the  value  of  any 
remaining  inventory  at  the  end  of  the  month  is  given  by 


(if  any  term  is  negative,  then  it  represents  the  cost  of  purchasing  additional  raw  mate¬ 
rials  to  meet  the  month’s  production  requirements — we  assume  that  these  additional 
purchases  are  made  at  the  new,  higher,  end-of-month  price).  The  total  windfall,  call 
it  7 r,  (over  all  products)  for  this  month  can  now  be  written  as 


(5.13) 


7 r  = 


ajxj 


3 


i 


Our  aim  is  to  choose  production  levels  Xj,  j  =  1,  2, . . . ,  n,  that  maximize  this 
windfall.  But  our  supplier’s  aim  is  to  choose  prices  Wi,  i  =  1,  2, . . . ,  m,  so  as  to 
minimize  our  windfall.  Before  studying  these  optimizations,  let  us  first  rewrite  the 
windfall  in  a  more  convenient  form.  As  in  Chapter  1,  let  yi  denote  the  increase  in 
the  price  of  raw  material  i.  That  is, 


(5.14) 


Wi=  pi  +  yi. 


^One  could  take  the  prices  of  raw  materials  as  fixed  and  argue  that  the  value  of  the  final  products 
will  fall.  It  doesn’t  really  matter  which  view  one  adopts,  since  prices  are  relative  anyway.  The  point  is 
simply  that  the  difference  between  the  price  of  the  raw  materials  and  the  price  of  the  final  products  must 
narrow  due  to  this  innovation. 
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Substituting  (5.14)  into  (5.13)  and  then  simplifying  notations  using  (5.12),  we  see 
that 


(5.15) 


^  =  E  cixi  +  E 

3  i 


To  emphasize  that  n  depends  on  each  of  the  Xj’s  and  on  the  yi  s,  we  sometimes 
write  it  as  7r(xi, . . . ,  xn,  yi, . . . ,  ym). 

Now  let  us  return  to  the  competing  optimizations.  Given  Xj  for  j  =  1,  2, . . . ,  n, 
the  suppliers  react  to  minimize  7r(xi, . . . ,  xn,  y\, . . . ,  ym).  Looking  at  (5.15),  we 
see  that  for  any  resource  i  in  short  supply,  that  is, 


E 


d{j  Xj 


<o, 


3 


the  suppliers  will  jack  up  the  price  immensely  (i.e.,  yi  =  oo).  To  avoid  this  obvi¬ 
ously  bad  situation,  the  production  manager  will  be  sure  to  set  the  production  levels 
so  that 


dijXj  <  bi,  i  =  1,2,...,  m. 

3 

On  the  other  hand,  for  any  resource  i  that  is  not  exhausted  during  the  windfall  month, 
that  is, 


bi  —  aijxj  >  0, 

3 

the  suppliers  will  have  no  incentive  to  change  the  prevailing  market  price  (i.e., 
yi  —  0).  Therefore,  from  the  production  manager’s  point  of  view,  the  problem  re¬ 
duces  to  one  of  maximizing 

Ec^i 

3 


subject  to  the  constraints  that 


d%j  Xj 


<  bi, 


Xj  >  0, 


i  =  1,  2, . . . ,  ra, 
j  =  1,2, . . .  ,n. 


This  is  just  our  usual  primal  linear  programming  problem.  This  is  the  problem  that 
the  production  manager  needs  to  solve  in  anticipation  of  adversarial  suppliers. 

Now  let  us  look  at  the  problem  from  the  suppliers’  point  of  view.  Rearranging 
the  terms  in  (5.15)  by  writing 


(5.16) 


Xj 


+  Ui^i, 


we  see  that  if  the  suppliers  set  prices  in  such  a  manner  that  a  windfall  remains  on 
the  jth  product  even  after  the  price  adjustment,  that  is, 


^  ^  Vi^ij  ^  0 


i 
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then  the  production  manager  would  be  able  to  generate  for  the  facility  an  arbitrarily 
large  windfall  by  producing  a  huge  amount  of  the  jth  product  (i.e.,  x3  =  oo).  We 
assume  that  this  is  unacceptable  to  the  suppliers,  and  so  they  will  determine  their 
price  increases  so  that 

^  ^  Vi&ij  d  Cj  5  j  1,  2,  .  .  .  ,  Tl. 

i 

Also,  if  the  suppliers  set  the  price  increases  too  high  so  that  the  production  facility 
will  lose  money  by  producing  product  j,  that  is, 

Cj  ^  ^  Vi&ij  ^  0} 
i 

then  the  production  manager  would  simply  decide  not  to  engage  in  that  activity. 
That  is,  she  would  set  Xj  =  0.  Hence,  the  first  term  in  (5.16)  will  always  be  zero, 
and  so  the  optimization  problem  faced  by  the  suppliers  is  to  minimize 

i 

subject  to  the  constraints  that 

^  ^  Vi&ij  di  Cj  •>  j  1,  2,  .  .  .  ,  77, 
i 

Hi  >  0,  7  =  1,2,...,  m. 

This  is  precisely  the  dual  of  the  production  manager’s  problem! 

As  we’ve  seen  earlier  with  the  strong  duality  theorem,  if  the  production  man¬ 
ager’s  problem  has  an  optimal  solution,  then  so  does  the  suppliers’  problem,  and 
the  two  objectives  agree.  This  means  than  an  equilibrium  can  be  reestablished  by 
setting  the  production  levels  and  the  price  hikes  according  to  the  optimal  solutions 
to  these  two  linear  programming  problems. 

10.  Lagrangian  Duality 

The  analysis  of  the  preceding  section  is  an  example  of  a  general  technique  that 
forms  the  foundation  of  a  subject  called  Lagrangian  duality ,  which  we  shall  briefly 
describe. 

Let  us  start  by  summarizing  what  we  did.  It  was  quite  simple.  The  analysis 
revolved  around  a  function 

7T {x\  ,  •  •  •  ,  Xni  7/1 ,  ...  ,  7/m)  —  ^  ^  Cj  Xj  ^  ^  ^  ^  H-  ^  ^ 

j  i  j  i 

To  streamline  notations,  let  x  stand  for  the  entire  collection  of  variables  X\,X2, 
. . . ,  xn  and  let  y  stand  for  the  collection  of  yd s  so  that  we  can  write  7r(x,  y)  in 
place  of  7r(xi, . . . ,  xn,  7/1, . . . ,  ym).  Written  with  these  notations,  we  showed  in  the 
previous  section  that 

max  min7r(x,7/)  =  min  max7r(x, 

x>0  y>  0  y>  0  x>0 
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We  also  showed  that  the  inner  optimization  could  in  both  cases  be  solved  explicitly, 
that  the  max-min  problem  reduced  to  a  linear  programming  problem,  and  that  the 
min-max  problem  reduced  to  the  dual  linear  programming  problem. 

One  could  imagine  trying  to  carry  out  the  same  program  for  functions  7r  that 
don’t  necessarily  have  the  form  shown  above.  In  the  general  case,  one  needs  to 
consider  each  step  carefully.  The  max-min  problem  is  called  the  primal  problem, 
and  the  min-max  problem  is  called  the  dual  problem.  However,  it  may  or  may  not 
be  true  that  these  two  problems  have  the  same  optimal  objective  values.  In  fact,  the 
subject  is  interesting  because  one  can  indeed  state  specific,  verify  able  conditions 
for  which  the  two  problems  do  agree.  Also,  one  would  like  to  be  able  to  solve  the 
inner  optimizations  explicitly  so  that  the  primal  problem  can  be  stated  as  a  pure 
maximization  problem  and  the  dual  can  be  stated  as  a  pure  minimization  problem. 
This,  too,  is  often  doable.  There  are  various  ways  in  which  one  can  extend  the 
notions  of  duality  beyond  the  context  of  linear  programming.  The  one  just  described 
is  referred  to  as  Lagrangian  duality.  It  is  perhaps  the  most  important  such  extension. 

Exercises 

In  solving  the  following  problems,  the  advanced  pivot  tool  can  be  used  to  check 
your  arithmetic: 

www.princeton.edu/~rvdb/JAVA/pivot/advanced.html 

5.1  What  is  the  dual  of  the  following  linear  programming  problem: 

maximize  x\  —  2x2 
subject  to  x\  +  2x2  ~  xs  +  x4  >  0 

Ax  i  +  3x2  +  4x3  —  2x4  <  3 

—X\  —  X2  +  2X3  +  X4  =  1 

X2,  %3  >  0. 

5.2  Illustrate  Theorem  5.2  on  the  problem  in  Exercise  2.9. 

5.3  Illustrate  Theorem  5.2  on  the  problem  in  Exercise  2.1. 

5.4  Illustrate  Theorem  5.2  on  the  problem  in  Exercise  2.2. 

5.5  Consider  the  following  linear  programming  problem: 

maximize  2xi  +  8x2  —  X3  —  2x4 

subject  to  2xi  +  3x2  +  6x4  <  6 

— 2xi  T"  4x2  4~  3x3  ^  1.5 

3xi  +  2x2  —  2x3  —  4x4  <  4 

Xi,  x2,  x3,  x4  >  0  . 

Suppose  that,  in  solving  this  problem,  you  have  arrived  at  the  following 
dictionary: 

C  =  3.5  —  0.25txi  +  6.25x2  —  0.3ws  —  1.5x4 

x\  =  3.0  —  0.5w\  —  1.5x2  —  3.0x4 

W2  =  0.0  -f  1.25u>i  —  3.25x2  —  1.5u;3  +  13.5x4 

X3  =  2.5  —  0.75txi  —  1.25x2  H-  —  6.5x4  . 
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(a)  Write  down  the  dual  problem. 

(b)  In  the  dictionary  shown  above,  which  variables  are  basic?  Which  are 
nonbasic? 

(c)  Write  down  the  primal  solution  corresponding  to  the  given  dictio¬ 
nary.  Is  it  feasible?  Is  it  degenerate? 

(d)  Write  down  the  corresponding  dual  dictionary. 

(e)  Write  down  the  dual  solution.  Is  it  feasible? 

(f)  Do  the  primal/dual  solutions  you  wrote  above  satisfy  the  comple¬ 
mentary  slackness  property? 

(g)  Is  the  current  primal  solution  optimal? 

(h)  For  the  next  (primal)  pivot,  which  variable  will  enter  if  the  largest 
coefficient  rule  is  used?  Which  will  leave?  Will  the  pivot  be  degen¬ 
erate? 

5.6  Solve  the  following  linear  program: 


maximize 

—  X\ 

— 

2x2 

subject  to 

—2x\ 

+ 

7x2 

< 

6 

— 3xi 

+ 

x2 

< 

-1 

9xi 

— 

to 

< 

6 

X\ 

— 

X2 

< 

1 

7x\ 

— 

3x2 

< 

6 

— 5#i 

+ 

2x2 

< 

-3 

Xl, 

x2 

> 

0 

5.7  Solve  the  linear  program  given  in  Exercise  2.3  using  the  dual-primal  two- 
phase  algorithm. 

5.8  Solve  the  linear  program  given  in  Exercise  2.4  using  the  dual-primal  two- 
phase  algorithm. 

5.9  Solve  the  linear  program  given  in  Exercise  2.6  using  the  dual-primal  two- 
phase  algorithm. 

5.10  Using  today’s  date  (MMYY)  for  the  seed  value,  solve  10  problems  using 
the  dual  phase  I  primal  phase  II  simplex  method: 

www.princeton.edu/^rvdb/JAVA/pivot/dp2phase.html 

5.11  Using  today’s  date  (MMYY)  for  the  seed  value,  solve  10  problems  using 
the  primal  phase  I  dual  phase  II  simplex  method: 

www.princeton.edu/^rvdb/JAVA/pivot/pd2phase.html 

5.12  For  x  and  y  in  R,  compute 

max  min  (x  —  y)  and  min  max  (x  —  y) 

x>0  y>  0  y>  0  x>0 

and  note  whether  or  not  they  are  equal. 
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5.13 


Consider  the  following  process.  Starting  with  a  linear  programming  prob¬ 
lem  in  standard  form, 

n 


maximize 

°jxj 

3  = 1 
n 

subject  to 

•c>> 

rO 

VI 

■Os 

Xj  >  0 


i  —  1,  2, . . . ,  m 
j  =  1,2, . . .  ,n, 


first  form  its  dual: 

minimize 

m 

y^biyi 

subject  to 

MV 

<S>. 

© 

<s> . 

IV 

J? 

j  =  1,2, . . .  ,n 

2=1 

Vi>  o 

i  —  1,  2, . . . ,  m. 

Then  replace  the  minimization  in  the  dual  with  a  maximization  to  get  a 
new  linear  programming  problem,  which  we  can  write  in  standard  form 
as  follows: 

m 

maximize  E  hyi 

i— 1 
m 

subject  to  E  Vi^ij  E  Cj  j  —  1,  2,  .  .  .  ,  Ti 

2  =  1 

Hi  >  0  %  —  1,2,...,  m. 


If  we  identify  a  linear  programming  problem  with  its  data,  (a^,  bi,  Cj ), 
the  above  process  can  be  thought  of  as  a  transformation  T  on  the  space  of 
data  defined  by 

T 

(.CLij ,  bi ,  Cj )  t”  (  CLjii  Cj,bi). 

Let  (* (dij Cj)  denote  the  optimal  objective  function  value  of  the 
standard-form  linear  programming  problem  having  data  (a^-,  b^  cj). 
By  strong  duality  together  with  the  fact  that  a  maximization  dominates 
a  minimization,  it  follows  that 


C  (dijibiiCj)  EC  (  ^ jit  Cj^bi). 


Now  if  we  repeat  this  process,  we  get 


76 


5.  DUALITY  THEORY 


and  hence  that 


C  i  b%  7  Cj )  E  C  (  ttji,  Cj,  6$) 

—  C  (dij')  bit  Cj) 

—  C  (  djii  Cj  i  b{ ) 

—  C  {c^ij ,  bi,  Cj) . 

But  the  first  and  the  last  entry  in  this  chain  of  inequalities  are  equal.  There¬ 
fore,  all  these  inequalities  would  seem  to  be  equalities.  While  this  out¬ 
come  could  happen  sometimes,  it  certainly  isn’t  always  true.  What  is  the 
error  in  this  logic?  Can  you  state  a  (correct)  nontrivial  theorem  that  fol¬ 
lows  from  this  line  of  reasoning?  Can  you  give  an  example  where  the  four 
inequalities  are  indeed  all  equalities? 

5.14  Consider  the  following  variant  of  the  resource  allocation  problem: 

n 

maximize  E  CjXj 

3  = 1 

(5.17)  .  ” 

subject  to  >  aijxj  E  bi  i  =  1,  2, . . . ,  m 

3  = 1 

0  <  Xj  <  Uj  j  =  1,2 , . . . ,  n. 

As  usual,  the  c3  ’s  denote  the  unit  prices  for  the  products  and  the  bi ’s  denote 
the  number  of  units  on  hand  for  each  raw  material.  In  this  variant,  the  Uj ’s 
denote  upper  bounds  on  the  number  of  units  of  each  product  that  can  be 
sold  at  the  set  price.  Now,  let’s  assume  that  the  raw  materials  have  not 
been  purchased  yet  and  it  is  part  of  the  problem  to  determine  the  bi  s.  Let 
Pi,  i  =  1,2,. . . ,  m  denote  the  price  for  raw  material  i.  The  problem  then 
becomes  an  optimization  over  both  the  Xj’s  and  the  bi  s: 

n  m 

maximize  E  CjXj  -  ^ ~2pA 

.7  =  1  ?‘  =  1 

n 

subject  to  dijXj  —  bi  <  0  i  =  1,  2, . . . ,  m 

3  = 1 

0  <  Xj  <  Uj  j  =  1,  2, . . . ,  n 

bi>  0  7  =  1,2,...,  m. 

(a)  Show  that  this  problem  always  has  an  optimal  solution. 

(b)  Let  y*(b)  ,  i  =  1,  2, . . . ,  m,  denote  optimal  dual  variables  for  the 
original  resource  allocation  problem  (5.17).  Note  that  we’ve  explic¬ 
itly  indicated  that  these  dual  variables  depend  on  the  6’s.  Also,  we 
assume  that  problem  (5.17)  is  both  primal  and  dual  non-degenerate 
so  the  y*  ( b )  is  uniquely  defined.  Show  that  the  optimal  value  of  the 
bi  s,  call  them  6*’s,  satisfy 

y*(b*)  =Pi ■ 
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Hint:  You  will  need  to  use  the  fact  that,  for  resource  allocation  prob¬ 
lems,  we  have  >  0  for  all  i,  and  all  j. 


5.15  Consider  the  following  linear  program: 


n 


maximize 

J2pjxj 

7  =  1 
n 

subject  to 

VI 

'^5 

WX 

Xj  <  1  j  =  1,  2, . . . ,  n 

Xj  >  0  j  =  1,  2, . . . ,  n. 

Here,  the  numbers  pj,  j  =  1,  2, . . . ,  n  are  positive  and  sum  to  one.  The 
same  is  true  of  the  qf  s: 

n 

L  P  = 1 

7  =  1 

qj  >  0. 


Furthermore,  assume  that 

Pi  ^  P2  ^  ^  Pn 

—  <  —  <  *  *  •  <  - 

Qi  Q2  qn 


and  that  the  parameter  [3  is  a  small  positive  number.  Let  k  =  min{j  : 

qj+ 1  H - h  gn  <  f3}.  Let  2/0  denote  the  dual  variable  associated  with  the 

constraint  involving  /?,  and  let  i/j  denote  the  dual  variable  associated  with 
the  upper  bound  of  1  on  variable  Xj.  Using  duality  theory,  show  that  the 
optimal  values  of  the  primal  and  dual  variables  are  given  by 


r  o 

—  )  (3-qk+i - Qn 

i  j  qk 


(  Pk 


qk 


j  <k 
j  =  k 
j  >  k 

3=0 
0  <  j  <  k 

j  >  k 


See  Exercise  1.3  for  the  motivation  for  this  problem. 


5.16  Diet  Problem .  An  MIT  graduate  student  was  trying  to  make  ends  meet  on 
a  very  small  stipend.  He  went  to  the  library  and  looked  up  the 
National  Research  Council’s  publication  entitled  “Recommended  Dietary 
Allowances”  and  was  able  to  determine  a  minimum  daily  intake  quantity 
of  each  essential  nutrient  for  a  male  in  his  weight  and  age  category.  Let  m 
denote  the  number  of  nutrients  that  he  identified  as  important  to  his  diet, 
and  let  bi  for  i  =  1,  2, . . . ,  m  denote  his  personal  minimum  daily  require¬ 
ments.  Next,  he  made  a  list  of  his  favorite  foods  (which,  except  for  pizza 
and  due  mostly  to  laziness  and  ineptitude  in  the  kitchen,  consisted  almost 
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entirely  of  frozen  prepared  meals).  He  then  went  to  the  local  grocery  store 
and  made  a  list  of  the  unit  price  for  each  of  his  favorite  foods.  Let  us  de¬ 
note  these  prices  as  Cj  for  j  =  1,  2, . . . ,  n.  In  addition  to  prices,  he  also 
looked  at  the  labels  and  collected  information  about  how  much  of  the  crit¬ 
ical  nutrients  are  contained  in  one  serving  of  each  food.  Let  us  denote  by 
dij  the  amount  of  nutrient  i  contained  in  food  j.  (Fortunately,  he  was  able 
to  call  his  favorite  pizza  delivery  service  and  get  similar  information  from 
them.)  In  terms  of  this  information,  he  formulated  the  following  linear 


programming  problem: 

minimize 

n 

3  = 1 
n 

subject  to 

^  ^  Q*ij  %j  2 

3  = 1 

7  =  1,2,...,  m 

Xj  >  0 

j  =  1,2,..., n. 

Formulate  the  dual  to  this  linear  program.  Can  you  introduce  another 
person  into  the  above  story  whose  problem  would  naturally  be  to  solve 
the  dual? 

5.17  Saddle  points.  A  function  h(y)  defined  for  y  E  R  is  called  strongly  convex 
if 

•  h"(y)  >  0  for  all  y  E  R, 

•  lim^_oo  h'(y)  =  -oo,  and 

•  lim^oo  h'(y)  =  oo. 

A  function  h  is  called  strongly  concave  if  — /i  is  strongly  convex.  Let 
7 r(x,y),  be  a  function  defined  for  (x,  y)  E  M2  and  having  the  following 
form 


n{x,y)  =  f(x)  -  xy  +  g(y), 

where  /  is  strongly  concave  and  g  is  strongly  convex.  Using  elementary 
calculus 

1 .  Show  that  there  is  one  and  only  one  point  (x* ,  y*)  E  M2  at  which  the 
gradient  of  7r, 


V  7T 


dn/dx 
dn / dy  ’ 


vanishes.  Hint:  From  the  two  equations  obtained  by  setting  the 
derivatives  to  zero,  derive  two  other  relations  having  the  form  x  = 
4>(x)  and  y  =  f(y).  Then  study  the  functions  f  and  f  to  show  that 
there  is  one  and  only  one  solution. 

2.  Show  that 


max  min  7T (x,y)  =  n(x*  ,y*)  =  minmax7r(x,  y), 

ccGM  yeR  V  7  V  7  yeR  xGt  V  7 
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where  (x*,y*)  denotes  the  “critical  point”  identified  in  Part  1  above. 
(Note:  Be  sure  to  check  the  signs  of  the  second  derivatives  for  both 
the  inner  and  the  outer  optimizations.) 

Associated  with  each  strongly  convex  function  h  is  another  function,  called 
the  Legendre  transform  of  h  and  denoted  by  Lh,  defined  by 


Lh(x) 


max(xy  —  h(y )), 


x  G  R. 


3.  Using  elementary  calculus,  show  that  Lh  is  strongly  convex. 

4.  Show  that 


max  min  7T (ax  y) 

ccEM.  yEM 


max(f{x)  -  Lg(x)) 

X£K 


and  that 


min  max7r(ax 

yEM  xGt 


min  (g(y)  +  L-f(-y)). 

yEK 


5.  Show  that  the  Legendre  transform  of  the  Legendre  transform  of  a 
function  is  the  function  itself.  That  is, 


Llh  (z)  =  h(z)  for  all  zgM. 

Hint:  This  can  be  proved  from  scratch  but  it  is  easier  to  use  the  result 
of  Part  2  above. 


Notes 

The  idea  behind  the  strong  duality  theorem  can  be  traced  back  to  conversations 
between  G.B.  Dantzig  and  J.  von  Neumann  in  the  fall  of  1947,  but  an  explicit  state¬ 
ment  did  not  surface  until  the  paper  of  Gale  et  al.  (1951).  The  term  primal  problem 
was  coined  by  G.B.  Dantzig’s  father,  T.  Dantzig.  The  dual  simplex  method  was  first 
proposed  by  Lemke  (1954). 

The  solution  to  Exercise  5.13  (which  is  left  to  the  reader  to  supply)  suggests  that 
a  random  linear  programming  problem  is  infeasible  with  probability  1/4,  unbounded 
with  probability  1/4,  and  has  an  optimal  solution  with  probability  1/2. 


CHAPTER  6 


The  Simplex  Method  in  Matrix  Notation 


So  far,  we  have  avoided  using  matrix  notation  to  present  linear  programming 
problems  and  the  simplex  method.  In  this  chapter,  we  shall  recast  everything  into 
matrix  notation.  At  the  same  time,  we  will  emphasize  the  close  relations  between 
the  primal  and  the  dual  problems. 


1.  Matrix  Notation 


As  usual,  we  begin  our  discussion  with  the  standard-form  linear  programming 
problem: 

n 

maximize  E  CjXj 
j= 1 

n 

subject  to  aijxj  E  bi  i  =  1,  2, . . . ,  m 

3  = 1 

Xj  >  0  j  =  1,  2, . . . ,  n. 

In  the  past,  we  have  generally  denoted  slack  variables  by  s  but  have  noted  that 
sometimes  it  is  convenient  just  to  string  them  onto  the  end  of  the  list  of  original 
variables.  Such  is  the  case  now,  and  so  we  introduce  slack  variables  as  follows: 


n 


Xn-\-i  —  hi  ^  ^ 
3  = 1 


&ij  Xj , 


i  =  1,  2, . . . ,  m. 


With  these  slack  variables,  we  now  write  our  problem  in  matrix  form: 

maximize  cTx 


where 


(6.1) 


A  = 


subject  to 

Ax  =  b 
x  >  0, 

an 

ai  2  ... 

*2l  n 

1 

&21 

<222 

© 

•  to 

3 

1 

ami 

^m2  •  •  • 

*2  mn 

1 
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(6.2) 


bi 

b2 


"  Cl 

X\ 

C2 

X2 

Cn 

,  and  x  = 

xn 

0 

xn-\- 1 

0 

xn-\-m 

As  we  know,  the  simplex  method  is  an  iterative  procedure  in  which  each  iter¬ 
ation  is  characterized  by  specifying  which  m  of  the  n  +  m  variables  are  basic.  As 
before,  we  denote  by  B  the  set  of  indices  corresponding  to  the  basic  variables,  and 
we  denote  by  J\f  the  remaining  nonbasic  indices. 

In  component  notation,  the  Ah  component  of  Ax  can  be  broken  up  into  a  basic 
part  and  a  nonbasic  part: 


n+m 

(6.3)  E  x  j  —  ^  ^  CL ij  X  j  ~\~  ^  ^  Q'ij  x  j  ’ 

j  =  1  j€  A 


We  wish  to  introduce  a  notation  for  matrices  that  will  allow  us  to  break  up  the  matrix 
product  Ax  analogously.  To  this  end,  let  B  denote  an  m  x  m  matrix  whose  columns 
consist  precisely  of  the  m  columns  of  A  that  are  associated  with  the  basic  variables. 
Similarly,  let  N  denote  an  mxn  matrix  whose  columns  are  the  n  nonbasic  columns 
of  A.  Then  we  write  A  in  a  partitioned-matrix  form  as  follows: 


A  =  [B  TV] 


Strictly  speaking,  the  matrix  on  the  right  does  not  equal  the  A  matrix.  Instead,  it 
is  the  A  matrix  with  its  columns  rearranged  in  such  a  manner  that  all  the  columns 
associated  with  basic  variables  are  listed  first  followed  by  the  nonbasic  columns. 
Nonetheless,  as  long  as  we  are  consistent  and  rearrange  the  rows  of  x  in  the  same 
way,  then  no  harm  is  done.  Indeed,  let  us  similarly  rearrange  the  rows  of  x  and  write 


xB 

xJ\f 


Then  the  following  separation  of  Ax  into  a  sum  of  two  terms  is  true  and  captures 
the  same  separation  into  basic  and  nonbasic  parts  as  we  had  in  (6.3): 


Ax  = 

~B  N' 

XB 

xJ\f 

Bxb  +  Nxj\f. 


By  similarly  partitioning  c,  we  can  write 


T 

i-  -i 

T 

Cb 

XB 

C  X  = 

_CM  _ 

_XM  _ 

cBx8  +  cNxAf  • 
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2.  The  Primal  Simplex  Method 

A  dictionary  has  the  property  that  the  basic  variables  are  written  as  functions  of 
the  nonbasic  variables.  In  matrix  notation,  we  see  that  the  constraint  equations 

Ax  =  b 


can  be  written  as 


Bxb  +  Nx^f  =  b. 

The  fact  that  the  basic  variables  xg  can  be  written  as  a  function  of  the  nonbasic 
variables  is  equivalent  to  the  fact  that  the  matrix  B  is  invertible,  and  hence, 

(6.4)  Xtf  =  B~xb  —  B~1Nxj\f. 


(The  fact  that  B  is  invertible  means  that  its  m  column  vectors  are  linearly  indepen¬ 
dent  and  therefore  form  a  basis  for  Mm — this  is  why  the  basic  variables  are  called 
basic,  in  case  you  were  wondering.)  Similarly,  the  objective  function  can  be  written 
as 


(6.5)  C  =  cBxB  +  cj^xu 

=  cB  (B~1b  -  B~1Nxu)  +  cJfX^r 

=  cBB~1b  -  (( B~1N)tcb  -  cx)T  xu- 

Combining  (6.5)  and  (6.4),  we  see  that  we  can  write  the  dictionary  associated 
with  basis  B  as 

(66  C  =  clB-'b  -  ((B-'N  fcs  -  cn)T  xH 

xB=  B~1b  —  B~lNxu- 


Comparing  against  the  component-form  notation  of  Chapter  2  (see  (2.6)),  we  make 
the  following  identifications: 


clB-H 


=  C 


C\r 


(B~l  N)T  cB 
B~xb 
B~lN 


icoi 
\ 
a 


lj\  ’ 


where  the  bracketed  expressions  on  the  right  denote  vectors  and  matrices  with  the 
index  i  running  over  B  and  the  index  j  running  over  A f.  The  basic  solution  associ¬ 
ated  with  dictionary  (6.6)  is  obtained  by  setting  xj^  equal  to  zero: 


(6.7) 


4r  =  o, 

xB  =  B~1b. 


As  we  saw  in  the  last  chapter,  associated  with  each  primal  dictionary  there  is 
a  dual  dictionary  that  is  simply  the  negative-transpose  of  the  primal.  However,  to 
have  the  negative-transpose  property,  it  is  important  to  correctly  associate  comple¬ 
mentary  pairs  of  variables.  So  first  we  recall  that,  for  the  current  discussion,  we 
have  appended  the  primal  slack  variables  to  the  end  of  the  original  variables: 


Oi, 


%n  •>  ^1 1 


W 


m 


)  - >  (xi, 


X 


rn 


^n+1?  •  •  •  i  ^n+m) 
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Also  recall  that  the  dual  slack  variables  are  complementary  to  the  original  primal 
variables  and  that  the  original  dual  variables  are  complementary  to  the  primal  slack 
variables.  Therefore,  to  maintain  the  desired  complementarity  condition  between 
like  indices  in  the  primal  and  the  dual,  we  need  to  relabel  the  dual  variables  and 
append  them  to  the  end  of  the  dual  slacks: 


z 


n 


5  Vli  •  •  •  i  Vm)  ^  (^1?  •  •  •  i  Zni  ^n+1?  •  •  •  ?  ^n+m) • 


With  this  relabeling  of  the  dual  variables,  the  dual  dictionary  corresponding  to 
(6.6)  is 

-£  =  -cTBB~xb  -  [B-xb)T zB 

zM  =  {B^NYcb  -  cM  +  (B^N)tzb. 

The  dual  solution  associated  with  this  dictionary  is  obtained  by  setting  zB  equal  to 
zero: 


4  =  o, 

(6.8)  z*H  =  ( B~1N)tcb  -  cm- 

Using  (6.7)  and  (6.8)  and  introducing  the  shorthand 

(6.9)  C*  =  ctbB-\ 


we  see  that  we  can  write  the  primal  dictionary  succinctly  as 

T 

i  =  i  —  7:, 

(6.10) 


c=  c-zyxM 

xB  =  4  —  B~lNx^f. 


The  associated  dual  dictionary  then  has  a  very  symmetric  appearance: 


(6.11) 


-Z  = -C  -  (*b)T ZB 
ZM=  ZM  +  {B 


The  (primal)  simplex  method  can  be  described  briefly  as  follows.  The  starting 
assumptions  are  that  we  are  given 

(1)  A  partition  of  the  n  +  m  indices  into  a  collection  B  of  m  basic  indices  and 
a  collection  Af  of  n  nonbasic  ones  with  the  property  that  the  basis  matrix 
B  is  invertible, 

(2)  An  associated  current  primal  solution  x*B  >  0  (and  Xj^  =  0),  and 

(3)  An  associated  current  dual  solution  (with  zB  =  0) 

such  that  the  dictionary  given  by  (6.10)  represents  the  primal  objective  function  and 
the  primal  constraints.  The  simplex  method  then  produces  a  sequence  of  steps  to 
“adjacent”  bases  such  that  the  current  value  of  the  objective  function  (  increases 
at  each  step  (or,  at  least,  would  increase  if  the  step  size  were  positive),  updating 
xB  and  z^f  along  the  way.  Two  bases  are  said  to  be  adjacent  to  each  other  if  they 
differ  in  only  one  index.  That  is,  given  a  basis  B,  an  adjacent  basis  is  determined 
by  removing  one  basic  index  and  replacing  it  with  a  nonbasic  index.  The  index  that 
gets  removed  corresponds  to  the  leaving  variable,  whereas  the  index  that  gets  added 
corresponds  to  the  entering  variable. 

One  step  of  the  simplex  method  is  called  an  iteration.  We  now  elaborate  further 
on  the  details  by  describing  one  iteration  as  a  sequence  of  specific  steps. 
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Step  1.  Check  for  Optimality.  If  zff  >  0,  stop.  The  current  solution  is  optimal. 
To  see  this,  first  note  that  the  simplex  method  always  maintains  primal  feasibility 
and  complementarity.  Indeed,  the  primal  solution  is  feasible,  since  x*B  >  0  and 
xj\f  =  0  and  the  dictionary  embodies  the  primal  constraints.  Also,  the  fact  that 
xff  =  0  and  zB  —  0  implies  that  the  primal  and  dual  solutions  are  complementary. 
Hence,  all  that  is  required  for  optimality  is  dual  feasibility.  But  by  looking  at  the 
associated  dual  dictionary  (6.1 1),  we  see  that  the  dual  solution  is  feasible  if  and  only 
if  zff  >  0. 

Step  2.  Select  Entering  Variable.  Pick  an  index  3  G  A f  for  which  zi  < 
Variable  Xj  is  the  entering  variable. 

Step  3.  Compute  Primal  Step  Direction  A x&.  Having  selected  the  entering 
variable,  it  is  our  intention  to  let  its  value  increase  from  zero.  Hence,  we  let 


0 


XN  = 


0 

t 

0 


0 


—  tej , 

jth  position 


where  we  follow  the  common  convention  of  letting  ej  denote  the  unit  vector  that 
is  zero  in  every  component  except  for  a  one  in  the  position  associated  with  index  j 
(note  that,  because  of  our  index  rearrangement  conventions,  this  is  not  generally  the 
jth  element  of  the  vector).  Then  from  (6.10),  we  have  that 

xjs  =  xB  —  B~1Ntej. 

Hence,  we  see  that  the  step  direction  Ax&  for  the  primal  basic  variables  is  given  by 

Axb  =  B~1Nej. 

Step  4.  Compute  Primal  Step  Length.  We  wish  to  pick  the  largest  t  >  0  for 
which  every  component  of  xb  remains  nonnegative.  That  is,  we  wish  to  pick  the 
largest  t  for  which 

Xb  >  tAxB- 

Since,  for  each  i  G  x-  >  0  and  t  >  0,  we  can  divide  both  sides  of  the  above 
inequality  by  these  numbers  and  preserve  the  sense  of  the  inequality.  Therefore, 
doing  this  division,  we  get  the  requirement  that 

1  A  Xi  n  _ 

-  >  - ,  for  all  i  G  B. 

t  x* 

We  want  to  let  t  be  as  large  as  possible,  and  so  1  ft  should  be  made  as  small  as  pos¬ 
sible.  The  smallest  possible  value  for  1  jt  that  satisfies  all  the  required  inequalities 
is  obviously 

1  A  Xi 

-  =  max - . 

t  ieB  x* 

L 
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Hence,  the  largest  t  for  which  all  of  the  inequalities  hold  is  given  by 

AxA  -1 

max - 

ieB  x*  J 

As  always,  the  correct  convention  for  0/0  is  to  set  such  ratios  to  zero.  Also,  if  the 
maximum  is  less  than  or  equal  to  zero,  we  can  stop  here — the  primal  is  unbounded. 

Step  5.  Select  Leaving  Variable.  The  leaving  variable  is  chosen  as  any  variable 
Xi,ieB ,  for  which  the  maximum  in  the  calculation  of  t  is  obtained. 

Step  6.  Compute  Dual  Step  Direction  A zjsf.  Essentially  all  that  remains  is  to 
explain  how  changes.  To  see  how,  it  is  convenient  to  look  at  the  dual  dictionary. 
Since  in  that  dictionary  Zi  is  the  entering  variable,  we  see  that 

A  zM  =  -(B-'Nfei. 

Step  7.  Compute  Dual  Step  Length.  Since  we  know  that  Zj  is  the  leaving 
variable  in  the  dual  dictionary,  we  see  immediately  that  the  step  length  for  the  dual 
variables  is 

j 

S  Tv-  * 

A  zj 

Step  8.  Update  Current  Primal  and  Dual  Solutions.  We  now  have  everything 
we  need  to  update  the  data  in  the  dictionary: 

x*  <-  t 

—  tAxfi 

and 

zi  s 

ZM  ^  ZX r  ~  sAzj\f . 

Step  9.  Update  Basis.  Finally,  we  update  the  basis: 

B  ^  B\  {i}  U  {j}. 

We  close  this  section  with  the  important  remark  that  the  simplex  method  as 
presented  here,  while  it  may  look  different  from  the  component-form  presentation 
given  in  Chapter  2,  is  in  fact  mathematically  identical  to  it.  That  is,  given  the  same 
set  of  pivoting  rules  and  starting  from  the  same  primal  dictionary,  the  two  algorithms 
will  generate  exactly  the  same  sequence  of  dictionaries. 

3.  An  Example 

In  case  the  reader  is  feeling  at  this  point  that  there  are  too  many  letters  and 
not  enough  numbers,  here  is  an  example  that  illustrates  the  matrix  approach  to  the 
simplex  method.  The  problem  we  wish  to  solve  is 

maximize  4xi  +  3^2 

subject  to  x\  —  X2  <  1 

2xi  —  x‘2  <  3 

x2  <  5 

Xi,  X2  >  0  . 
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The  matrix  A  is  given  by 

I"  i  -1  1 
2-11 
0  1  1 

(Note  that  some  zeros  have  not  been  shown.)  The  initial  sets  of  basic  and  nonbasic 
indices  are 

B  =  {  3,4,5}  and  N  =  {1,2}. 

Corresponding  to  these  sets,  we  have  the  submatrices  of  A: 


1 

1 

-1 

B  = 

1 

N  = 

2 

-1 

1 

0 

1 

From  (6.7)  we  see  that  the  initial  values  of  the  basic  variables  are  given  by 


x 


and  from  (6.8)  the  initial  nonbasic  dual  variables  are  simply 


=  -cm 


-4 

-3 


Since  x%  >  0,  the  initial  solution  is  primal  feasible,  and  hence  we  can  apply  the 
simplex  method  without  needing  any  Phase  I  procedure. 


3.1.  First  Iteration.  Step  1.  Since  has  some  negative  components,  the 
current  solution  is  not  optimal. 

Step  2.  Since  z*  =  —4  and  this  is  the  most  negative  of  the  two  nonbasic  dual 
variables,  we  see  that  the  entering  index  is 


3  =  !• 


Step  3. 


Ax  b  =  B  1Nej 


Step  4. 


t 


-( 


'  1 

-1  " 

i  i 

O 

'  1  " 

2 

-1 

— 

2 

i 

o 

1 

i 

o 

1  2  0 
max  <  - ,  - ,  - 


-l 


=  1. 


v  (13  5 

Step  5.  Since  the  ratio  that  achieved  the  maximum  in  Step  4  was  the  first  ratio 
and  this  ratio  corresponds  to  basis  index  3,  we  see  that 


7  =  3. 


Step  6. 


1  2  0 

-1  -1  1 


1 

0 

0 


A  zu  =  -{B-'Nfei 
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Step  7. 

S  = 

Step  8. 

A  =  l, 

XB  = 

4  =  4, 

zXr  = 

ZJ 

A  z. 


1 

3 

5 


-1 


=  4. 


i — 

"  1  " 

i 

o 

1 _ 

-  1 

2 

— 

1 

— i 

o 

5 

4 

3 


-4 


1 

1 


0 

-7 


Step  9.  The  new  sets  of  basic  and  nonbasic  indices  are 

B  =  {1,4,5}  and  AT  =  { 3,2}. 

Corresponding  to  these  sets,  we  have  the  new  basic  and  nonbasic  submatrices  of  A, 


1 

1 

-1 

B  = 

2  1 

N  = 

0 

-1 

0  1 

0 

1 

and  the  new  basic  primal  variables  and  nonbasic  dual  variables: 


* 

XB  = 


i 

H 

* 

1 

T— 1 

1 _ 

rp  ^ 

X4 

— 

1 

ry»  ^ 

x5 

5 

zAf  = 


z3 

1 

1 _ 

1 

O 

to* 

1 

1^ 

_ 1 

3.2.  Second  Iteration.  Step  1.  Since  has  some  negative  components,  the 
current  solution  is  not  optimal. 

Step  2.  Since  =  —  7,  we  see  that  the  entering  index  is 

3  =  2. 

Step  3. 


A xjs  =  B  xNej 


.i 


"  1 

-1 

"  1 

-1 " 

1  1 

0 

"  -1  " 

2  1 

0 

-1 

— 

1 

0  1 

0 

1 

1 

Step  4. 


t 


( 


1  1  1 


-l 


max 


1  ’  1  ’  5 


1. 


Step  5.  Since  the  ratio  that  achieved  the  maximum  in  Step  4  was  the  second 
ratio  and  this  ratio  corresponds  to  basis  index  4,  we  see  that 

i  —  4. 


Step  6. 

Az^  = 


(B~i  N)T  e 


1  0  0 
1-1  1 


"  1  2 

0 

i—i 

1 

1 

0 

1 

— 

2 

-1 

1 

1 

- 

0 
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Step  7. 


Step  8. 


zl  =  7, 


z* 

ZJ 

A  z. 


7 


-1 


=  7. 


1 

-1 

1 

CN 

l _ 

*  1 

x2  =  1, 

* 

xB  = 

1 

- 1 

1 

— 

0 

5 

1 

i 

_ i 

ZM  = 


4 

-7 


-  7 


2 

1 


10 

0 


Step  9.  The  new  sets  of  basic  and  nonbasic  indices  are 

B  =  {  1,2,5}  and  AT  =  {3, 4} 


Corresponding  to  these  sets,  we  have  the  new  basic  and  nonbasic  submatrices  of  A, 


1 

-1 

0  " 

1 

0  " 

B  = 

2 

-1 

0 

N  = 

0 

1 

0 

1 

1 

0 

0 

and  the  new  basic  primal  variables  and  nonbasic  dual  variables: 


x 


rp* 

Jb  ^ 

i 

CM 

i _ 

ry* 

x2 

— 

1 

ry*  1 

x5 

l 

'  -10  ' 

7* 

z4 

7 

3.3.  Third  Iteration.  Step  1.  Since  has  some  negative  components,  the 
current  solution  is  not  optimal. 

Step  2.  Since  z%  =  —10,  we  see  that  the  entering  index  is 


Step  3. 


"1-1  0  " 

-l 

"10" 

1 

'  -1  " 

A xb  =  B  lNej  = 

2-1  0 

0  1 

1 

n 

— 

-2 

0  1  1 

0  0 

VJ 

2 

Step  4. 

f  f-1  -2  21V1 

t  =  max  <  — ,  — ,  -  >  =2. 

V  l  2  1  ’4  ) 


Step  5.  Since  the  ratio  that  achieved  the  maximum  in  Step  4  was  the  third  ratio 
and  this  ratio  corresponds  to  basis  index  5,  we  see  that 


i  =  5. 


Step  6. 

A  zx  = 


{B~LN)Te 


'  1 

0 

o 

- 1 

i 

o 

1 

o 

1 - 

1  2 

1  -1 

0  0 


0 

1 

1 


1—1 

1 

O  1 

1 _ 

-2 

0 

1 

— 1 

1 

1 

i _ 
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Step  7. 

s  = 

Step  8. 

X3  =  2, 

XB  = 

z*5  =  5, 

zXr  = 

Z 


* 

3 


-10 


Azj 


=  5. 


2 

1 

4 


i — 

"  -1  " 

i 

i _ 

to 

-2 

— 

5 

- 1 

to 

i 

o 

10 

7 


-  5 


2 

1 


0 

2 


Step  9.  The  new  sets  of  basic  and  nonbasic  indices  are 

£  =  {1,2,3}  and  A7={5,4}. 

Corresponding  to  these  sets,  we  have  the  new  basic  and  nonbasic  submatrices  of  A, 


1 

-1 

1 

"  0 

0  " 

B  = 

2 

-1 

0 

N  = 

0 

1 

0 

1 

0 

1 

0 

and  the  new  basic  primal  variables  and  nonbasic  dual  variables: 


x 


* 

B 


Jb  ^ 

1 

1 _ 

rp  ^ 

x2 

— 

5 

x3 

i 

CN 

z!r 


z5 

5 

i 

N 

^  * 

i 

to 

i _ 

3.4.  Fourth  Iteration.  Step  1.  Since  has  all  nonnegative  components,  the 
current  solution  is  optimal.  The  optimal  objective  function  value  is 

c*  =  4  x\  +  3x5  =  31. 

It  is  undoubtedly  clear  at  this  point  that  the  matrix  approach,  as  we  have  pre¬ 
sented  it,  is  quite  a  bit  more  tedious  than  the  dictionary  manipulations  with  which 
we  are  quite  familiar.  The  reason  is  that,  with  the  dictionary  approach,  dictionary 
entries  get  updated  from  one  iteration  to  the  next  and  the  updating  process  is  fairly 
easy,  whereas  with  the  matrix  approach,  we  continually  compute  everything  from 
scratch  and  therefore  end  up  solving  many  systems  of  equations.  In  the  next  chapter, 
we  will  deal  with  this  issue  and  show  that  these  systems  of  equations  don’t  really 
have  to  be  solved  from  scratch  each  time;  instead,  there  is  a  certain  updating  that 
can  be  done  that  is  quite  analogous  to  the  updating  of  a  dictionary.  However,  be¬ 
fore  we  take  up  such  practical  considerations,  let  us  finish  our  general  discussion 
of  the  simplex  method  by  casting  the  dual  simplex  method  into  matrix  notation  and 
discussing  some  related  issues. 


4.  The  Dual  Simplex  Method 

In  the  presentation  of  the  primal  simplex  method  given  in  the  previous  section, 
we  tried  to  make  the  symmetry  between  the  primal  and  the  dual  problems  as  evident 
as  possible.  One  advantage  of  this  approach  is  that  we  can  now  easily  write  down 
the  dual  simplex  method.  Instead  of  assuming  that  the  primal  dictionary  is  feasible 
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(xg  >  0),  we  now  assume  that  the  dual  dictionary  is  feasible  (zj^  >  0)  and  perform 
the  analogous  steps: 

Step  1.  Check  for  Optimality.  If  x*B  >  0,  stop.  The  current  solution  is  optimal. 
Note  that  for  the  dual  simplex  method,  dual  feasibility  and  complementarity  are 
maintained  from  the  beginning,  and  the  algorithm  terminates  once  a  primal  feasible 
solution  is  discovered. 

Step  2.  Select  Entering  Variable .  Pick  an  index  i  G  13  for  which  <  <  0. 
Variable  z$  is  the  entering  variable. 

Step  3.  Compute  Dual  Step  Direction  A zjy.  From  the  dual  dictionary,  we  see 

that 

A  zM  =  —(B~1N)Tei. 


Step  4.  Compute  Dual  Step  Length.  We  wish  to  pick  the  largest  8  >  0  for  which 
every  component  of  z jy  remains  nonnegative.  As  in  the  primal  simplex  method,  this 
computation  involves  computing  the  maximum  of  some  ratios: 

/  \  -i 


s  = 


max 

jeAT 


A  Zj 


z 


* 


If  8  is  not  positive,  then  stop  here — the  dual  is  unbounded  (implying,  of  course,  that 
the  primal  is  infeasible). 

Step  5.  Select  Leaving  Variable.  The  leaving  variable  is  chosen  as  any  variable 
Zj ,  j  G  A/*,  for  which  the  maximum  in  the  calculation  of  8  is  obtained. 

Step  6.  Compute  Primal  Step  Direction  A x&.  To  see  how  xB  changes  in  the 
dual  dictionary,  it  is  convenient  to  look  at  the  primal  dictionary.  Since  in  that  dictio¬ 
nary  Xj  is  the  entering  variable,  we  see  that 

Ax&  =  B~1Nej. 


Step  7.  Compute  Primal  Step  Length.  Since  we  know  that  xi  is  the  leaving 
variable  in  the  primal  dictionary,  we  see  immediately  that  the  step  length  for  the 
primal  variables  is 


t  = 


x- 


AXr 


Step  8.  Update  Current  Primal  and  Dual  Solutions.  We  now  have  everything 
we  need  to  update  the  data  in  the  dictionary: 


X  j  i —  t 

xB  xB  —  tAxj$. 

Z*  4—  s 

ZX f  ^  ZXf  ~  s^zAf 

Step  9.  Update  Basis.  Finally,  we  update  the  basis: 

B  ^  B  \  {i}  U  {j}. 


To  further  emphasize  the  similarities  between  the  primal  and  the  dual  simplex 
methods,  Figure  6.1  shows  the  two  algorithms  side  by  side. 
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Primal  Simplex 


Dual  Simplex 


Suppose  A  >  o 
while  (z^f  ^  0)  { 

pick  j  G  { j  e  M  :  Zj  <  0} 
Ax%  =  B~1Nej 


t  = 


ma  xieB 


A  x* 


x * 


pick  i  G  argmaxiGi3 


\ 


Az^  =  —(B~1N)Te 


rr* 

^ i 


S  = 


X*  <-  t 

XB^XB~  t^xB 


Z*  <-  S 

ZX r  ^  ZX f  ~  S^ZM 
B  ^  B\  {i}  U  {j} 


Suppose  zj^  >  0 
while  (xg  ^  0)  { 

pick  i  G  {i  G  B  :  x\  <  0} 
Azv  =  —(B~1N)Tei 


s  = 


maxj  £=a/" 


pick  j  G  argmaxieAA 


Axg  =  B  1Nej 


Axi 
x *  <-  t 

XB^XB~  t^xB 


zi  s 

zXf  zXf  —  s^zAf 

B^B\{i}U{j} 


Figure  6.1.  The  primal  and  the  dual  simplex  methods. 

5.  Two-Phase  Methods 

Let  us  summarize  the  algorithm  obtained  by  applying  the  dual  simplex  method 
as  a  Phase  I  procedure  followed  by  the  primal  simplex  method  as  a  Phase  II.  Initially, 
we  set 


8  =  {n  +  l,n  +  2,...,n  +  m}  and 


Then  from  (6.1)  we  see  that  A 


N  L>] ,  where 


Af  =  {1, 2, ... ,  n}. 


an 

&21 

a  12  ... 

CL22  ... 

n 

^2  n 

B  = 

"  1 

1 

ami 

^m2  •  •  • 

Ujrnn 

1  _ 

and  from  (6.2)  we  have 


’  Cl 

"  0  " 

CA/-  = 

C2 

and  eg  = 

0 

Cn 

0 
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Substituting  these  expressions  into  the  definitions  of  Xg,  and  £*,  we  find  that 

x*B  =  B~1b  =  b 

z*N  =  {B~1N)tcb  -  cm  =  c_\f 

C  =  o. 

Hence,  the  initial  dictionary  reads: 

C  =  cJfXAf 

xb  =  b  —  Nxj\f. 

If  b  has  all  nonnegative  components  and  cj\f  has  all  nonpositive  components, 
then  this  dictionary  is  optimal — the  problem  was  trivial.  Suppose,  however,  that  one 
of  these  two  vectors  (but  not  both)  has  components  of  the  wrong  sign.  For  exam¬ 
ple,  suppose  that  b  is  okay  (all  nonnegative  components)  but  cj\f  has  some  positive 
components.  Then  this  dictionary  is  primal  feasible,  and  we  can  start  immediately 
with  the  primal  simplex  method.  On  the  other  hand,  suppose  that  c^f  has  all  non¬ 
positive  components  but  b  has  some  negative  ones.  Then  the  starting  dictionary  is 
dual  feasible,  and  we  can  commence  immediately  with  the  dual  simplex  algorithm. 

The  last,  and  most  common,  case  is  where  both  b  and  c m  have  components  of 
the  wrong  sign.  In  this  case,  we  must  employ  a  two-phase  procedure.  There  are  two 
choices.  We  could  temporarily  replace  cyv  with  another  vector  that  is  nonpositive. 
Then  the  modified  problem  is  dual  feasible,  and  so  we  can  apply  the  dual  simplex 
method  to  find  an  optimal  solution  of  this  modified  problem.  After  that,  the  original 
objective  function  could  be  reinstated.  With  the  original  objective  function,  the 
optimal  solution  from  Phase  I  is  most  likely  not  optimal,  but  it  is  feasible,  and 
therefore  the  primal  simplex  method  can  be  used  to  find  the  optimal  solution  to  the 
original  problem. 

The  other  choice  would  be  to  modify  b  instead  of  c/v,  thereby  obtaining  a  primal 
feasible  solution  to  a  modified  problem.  Then  we  would  use  the  primal  simplex 
method  on  the  modified  problem  to  obtain  its  optimal  solution,  which  will  then  be 
dual  feasible  for  the  original  problem,  and  so  the  dual  simplex  method  can  be  used 
to  finish  the  problem. 


6.  Negative  Transpose  Property 

In  our  discussion  of  duality  in  Chapter  5,  we  emphasized  the  symmetry  be¬ 
tween  the  primal  problem  and  its  dual.  This  symmetry  can  be  easily  summarized 
by  saying  that  the  dual  of  a  standard-form  linear  programming  problem  is  the  neg¬ 
ative  transpose  of  the  primal  problem.  Now,  in  this  chapter,  the  symmetry  appears 
to  have  been  lost.  For  example,  the  basis  matrix  is  an  m  x  m  matrix.  Why  m  x  m 
and  not  n  x  n?  It  seems  strange.  In  fact,  if  we  had  started  with  the  dual  problem, 
added  slack  variables  to  it,  and  introduced  a  basis  matrix  on  that  side  it  would  be  an 
n  x  n  matrix.  How  are  these  two  basis  matrices  related?  It  turns  out  that  they  are 
not  themselves  related  in  any  simple  way,  but  the  important  matrix  B~1N  is  still 
the  negative  transpose  of  the  analogous  dual  construct.  The  purpose  of  this  section 
is  to  make  this  connection  clear. 
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Consider  a  standard- form  linear  programming  problem 


maximize 
subject  to 


T 

C  X 


Ax  <  b 
x  >  0, 


and  its  dual 


minimize 
subject  to 


bT  y 

ATy  >  c 

y>  o. 

Let  w  be  a  vector  containing  the  slack  variables  for  the  primal  problem,  let  z  be  a 
slack  vector  for  the  dual  problem,  and  write  both  problems  in  equality  form: 


maximize 
subject  to 


and 


minimize 
subject  to 


T 

C  X 

Ax  +  w  =  b 
x,w  >  0, 

bTy 

m 

A  y  —  z  =  c 

y,z>  o. 


Introducing  three  new  notations, 


A  =  \A  I 


c 


c 

0 


and 


x 


X 

w 


the  primal  problem  can  be  rewritten  succinctly  as  follows: 


maximize 
subject  to 


-T  - 
C  X 


Ax  =  b 

x  >  0. 


Similarly,  using  “hats”  for  new  notations  on  the  dual  side, 

0 


A  =  \-I  AT 


b  = 


and 


V  = 


z 

y 


the  dual  problem  can  be  rewritten  in  this  way: 

minimize  bTy 
subject  to  Ay  - 


c 


y>  o. 


Note  that  the  matrix  A 


A  /]isanmx(n  +  m)  matrix.  The  first  n  columns 
of  it  are  the  initial  nonbasic  variables  and  the  last  m  columns  are  the  initial  basic 
columns.  After  doing  some  simplex  pivots,  the  basic  and  nonbasic  columns  get 
jumbled  up  but  we  can  still  write  the  equality 

[A  I]  =  [N  B 

with  the  understanding  that  the  equality  only  holds  after  rearranging  the  columns 
appropriately. 

On  the  dual  side,  the  matrix  A  =[— I  AT  ]  is  an  n  x  (n  Am)  matrix.  The  first 
n  columns  of  it  are  the  initial  basic  variables  (for  the  dual  problem)  and  the  last  m 
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columns  are  the  initial  nonbasic  columns.  If  the  same  set  of  pivots  that  were  applied 
to  the  primal  problem  are  also  applied  to  the  dual,  then  the  columns  get  rearranged 
in  exactly  the  same  way  as  they  did  for  the  primal  and  we  can  write 


[-1  AT 


B  N 


again  with  the  proviso  that  the  columns  of  one  matrix  must  be  rearranged  in  a  spe¬ 
cific  manner  to  bring  it  into  exact  equality  with  the  other  matrix. 

Now,  the  primal  dictionary  involves  the  matrix  B_1N  whereas  the  dual  dictio- 
nary  involves  the  matrix  B~L N.  It  probably  doesn’t  seem  at  all  obvious  that  these 
two  matrices  are  negative  transposes  of  each  other.  To  see  that  it  is  so,  consider 
what  happens  when  we  multiply  A  by  AT  in  both  the  permuted  notation  and  the 
unpermuted  notation: 


N  B 


\bt  1 
nt 


—  A  rr~\  —  A  rr~] 

nbt  +  bnt 


and 


T 


A  I] 

-I' 

A 

=  -A  A-  A  =  0. 


These  two  expressions  obviously  must  agree  so  we  see  that 

—  A  m  —  A  rji 

NBt  +  BNt  =  0. 


Putting  the  two  terms  on  the  opposite  sides  of  the  equality  sign  and  multiplying  on 
the  right  by  the  inverse  of  BT  and  on  the  left  by  the  inverse  of  B,  we  get  that 

B~lN  =  -  (b-1n\T  , 
which  is  the  property  we  wished  to  establish. 


Exercises 

6.1  Consider  the  following  linear  programming  problem: 

maximize  —6x1  +  32x2  —  9x3 
subject  to  —  2x\  +  10^2  —  3x3  <  —6 

x\  —  7x2  +  2x3  <  4 

Xi,  X2,  X3  >  0  . 

Suppose  that,  in  solving  this  problem,  you  have  arrived  at  the  following 
dictionary: 

(  =  —18  —  3x4  +  2x2 
X*3  =  2  —  X4  +  4X2  —  2X5 

X\  =  2x4  —  X2  +  3X5  . 

(a)  Which  variables  are  basic?  Which  are  nonbasic? 

(b)  Write  down  the  vector,  x^,  of  current  primal  basic  solution  values. 

(c)  Write  down  the  vector,  z^-,  of  current  dual  nonbasic  solution  values. 

(d)  Write  down  B~  1N. 

(e)  Is  the  primal  solution  associated  with  this  dictionary  feasible? 
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(f)  Is  it  optimal? 

(g)  Is  it  degenerate? 

6.2  Consider  the  following  linear  programming  problem: 

maximize  x\  +  2x2  +  4x3  +  8x4  +  16x5 

subject  to  x\  +  2^2  +  3x3  +  4x4  +  5xs  <  2 

Txi  +  5x2  —  3x3  —  2x4  <  0 

Xi,  X2,  X3,  X4,  X5  >  0  . 

Consider  the  situation  in  which  X3  and  X5  are  basic  and  all  other  variables 
are  nonbasic.  Write  down: 

(a)  B , 

(b)  N, 

(c)  b, 

(d)  eg, 

(e)  cm, 

(f)  B-'N, 

(g)  x*B  =  B~lb, 

(h)  C*  =  cTBB~xb, 

(i)  Z*M  =  (B~1N)TC/3  -  cM, 

(j)  The  dictionary  corresponding  to  this  basis. 

6.3  Solve  the  problem  in  Exercise  2.1  using  the  matrix  form  of  the  primal 
simplex  method. 

6.4  Solve  the  problem  in  Exercise  2.4  using  the  matrix  form  of  the  dual  sim¬ 
plex  method. 

6.5  Solve  the  problem  in  Exercise  2.3  using  the  two-phase  approach  in  matrix 
form. 

6.6  Find  the  dual  of  the  following  linear  program: 

maximize  cTx 

subject  to  a  <  Ax  <  b 

l  <  x  <  u  . 

6.7  (a)  Let  A  be  a  given  m  x  n  matrix,  c  a  given  n-vector,  and  b  a  given 

m-vector.  Consider  the  following  max-min  problem: 

max  min  (cTx  —  yT  Ax  +  bT y)  . 

x>0  y> 0  v  J 

By  noting  that  the  inner  optimization  can  be  carried  out  explicitly, 
show  that  this  problem  can  be  reduced  to  a  linear  programming  prob¬ 
lem.  Write  it  explicitly. 

(b)  What  linear  programming  problem  do  you  get  if  the  min  and  max  are 
interchanged? 
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Notes 

In  this  chapter,  we  have  accomplished  two  tasks:  (1)  we  have  expressed  the 
simplex  method  in  matrix  notation,  and  (2)  we  have  reduced  the  information  we 
carry  from  iteration  to  iteration  to  simply  the  list  of  basic  variables  together  with 
current  values  of  the  primal  basic  variables  and  the  dual  nonbasic  variables.  In 
particular,  it  is  not  necessary  to  calculate  explicitly  all  the  entries  of  the  matrix 
B~1N. 

What’s  in  a  name?  There  are  times  when  one  thing  has  two  names.  So  far  in 
this  book,  we  have  discussed  essentially  only  one  algorithm:  the  simplex  method 
(assuming,  of  course,  that  specific  pivot  rules  have  been  settled  on).  But  this  one 
algorithm  is  sometimes  referred  to  as  the  simplex  method  and  at  other  times  it  is 
referred  to  as  the  revised  simplex  method.  The  distinction  being  made  with  this  new 
name  has  nothing  to  do  with  the  algorithm.  Rather  it  refers  to  the  specifics  of  an  im¬ 
plementation.  Indeed,  an  implementation  of  the  simplex  method  that  avoids  explicit 
calculation  of  the  matrix  B~1N  is  referred  to  as  an  implementation  of  the  revised 
simplex  method.  We  shall  see  in  Chapter  8  why  it  is  beneficial  to  avoid  computing 
B~1N. 


CHAPTER  7 


Sensitivity  and  Parametric  Analyses 


In  this  chapter,  we  consider  two  related  subjects.  The  first,  called  sensitiv¬ 
ity  analysis  (or  postoptimality  analysis)  addresses  the  following  question:  having 
found  an  optimal  solution  to  a  given  linear  programming  problem,  how  much  can 
we  change  the  data  and  have  the  current  partition  into  basic  and  nonbasic  variables 
remain  optimal?  The  second  subject  addresses  situations  in  which  one  wishes  to 
solve  not  just  one  linear  program,  but  a  whole  family  of  problems  parametrized  by 
a  single  real  variable. 

We  shall  study  parametric  analysis  in  a  very  specific  context  in  which  we  wish 
to  find  the  optimal  solution  to  a  given  linear  programming  problem  by  starting  from 
a  problem  whose  solution  is  trivially  known  and  then  deforming  this  problem  back 
to  the  original  problem,  maintaining  as  we  go  optimality  of  the  current  solution. 
The  result  of  this  deformation  approach  to  solving  a  linear  programming  problem 
is  a  new  variant  of  the  simplex  method,  which  is  called  the  parametric  self-dual 
simplex  method.  We  will  see  in  later  chapters  that  this  variant  of  the  simplex  method 
resembles,  in  certain  respects,  the  interior-point  methods  that  we  shall  study. 

1.  Sensitivity  Analysis 

One  often  needs  to  solve  not  just  one  linear  programming  problem  but  several 
closely  related  problems.  There  are  many  reasons  that  this  need  might  arise.  For 
example,  the  data  that  define  the  problem  may  have  been  rather  uncertain  and  one 
may  wish  to  consider  various  possible  data  scenarios.  Or  perhaps  the  data  are  known 
accurately  but  change  from  day  to  day,  and  the  problem  must  be  resolved  for  each 
new  day.  Whatever  the  reason,  this  situation  is  quite  common.  So  one  is  led  to 
ask  whether  it  is  possible  to  exploit  the  knowledge  of  a  previously  obtained  optimal 
solution  to  obtain  more  quickly  the  optimal  solution  to  the  problem  at  hand.  Of 
course,  the  answer  is  often  yes,  and  this  is  the  subject  of  this  section. 

We  shall  treat  a  number  of  possible  situations.  All  of  them  assume  that  a  prob¬ 
lem  has  been  solved  to  optimality.  This  means  that  we  have  at  our  disposal  the  final, 
optimal  dictionary: 

C  =  C*  -  Z*M_X M 

xb  =  Xg  —  B~xNx jsj. 

Suppose  we  wish  to  change  the  objective  coefficients  from  c  to,  say,  c.  It  is  natural  to 
ask  how  the  dictionary  at  hand  could  be  adjusted  to  become  a  valid  dictionary  for  the 
new  problem.  That  is,  we  want  to  maintain  the  current  classification  of  the  variables 
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into  basic  and  nonbasic  variables  and  simply  adjust  £*,  zff,  and  xB  appropriately. 
Recall  from  (6.7)  to  (6.9)  that 

x*B  =  B~\ 

ZM  =  ( B~1N)tcb  -  cm, 

C  =  clB-H. 

Hence,  the  change  from  c  to  c  requires  us  to  recompute  zjf  and  £*,  but  xB  remains 
unchanged.  Therefore,  after  recomputing  zjf  and  the  new  dictionary  is  still 
primal  feasible,  and  so  there  is  no  need  for  a  Phase  I  procedure:  we  can  jump  straight 
into  the  primal  simplex  method,  and  if  c  is  not  too  different  from  c,  we  can  expect 
to  get  to  the  new  optimal  solution  in  a  relatively  small  number  of  steps. 

Now  suppose  that  instead  of  changing  c,  we  wish  to  change  only  the  right-hand 
side  b.  In  this  case,  we  see  that  we  need  to  recompute  xB  and  £*,  but  zjf  remains 
unchanged.  Hence,  the  new  dictionary  will  be  dual  feasible,  and  so  we  can  apply 
the  dual  simplex  method  to  arrive  at  the  new  optimal  solution  fairly  directly. 

Therefore,  changing  just  the  objective  function  or  just  the  right-hand  side  results 
in  a  new  dictionary  having  nice  feasibility  properties.  What  if  we  need/want  to 
change  some  (or  all)  entries  in  both  the  objective  function  and  the  right-hand  side 
and  maybe  even  the  constraint  matrix  too?  In  this  case,  everything  changes: 
zjf,  xb-  Even  the  entries  in  B  and  N  change.  Nonetheless,  as  long  as  the  new 
basis  matrix  B  is  nonsingular,  we  can  make  a  new  dictionary  that  preserves  the  old 
classification  into  basic  and  nonbasic  variables.  The  new  dictionary  will  most  likely 
be  neither  primal  feasible  nor  dual  feasible,  but  if  the  changes  in  the  data  are  fairly 
small  in  magnitude,  one  would  still  expect  that  this  starting  dictionary  will  get  us  to 
an  optimal  solution  in  fewer  iterations  than  simply  starting  from  scratch.  While  there 
is  no  guarantee  that  any  of  these  so-called  warm- starts  will  end  up  in  fewer  iterations 
to  optimality,  extensive  empirical  evidence  indicates  that  this  procedure  often  makes 
a  substantial  improvement:  sometimes  the  warm-started  problems  solve  in  as  little 
as  1  %  of  the  time  it  takes  to  solve  the  original  problem. 

1.1.  Ranging.  Often  one  does  not  wish  to  solve  a  modification  of  the  original 
problem,  but  instead  just  wants  to  ask  a  hypothetical  question: 

If  I  were  to  change  the  objective  function  by  increasing  or  de¬ 
creasing  one  of  the  objective  coefficients  a  small  amount,  how 
much  could  I  increase/decrease  it  without  changing  the  optimal¬ 
ity  of  my  current  basis? 

To  study  this  question,  let  us  suppose  that  c  gets  changed  to  c  +  t  Ac,  where  t  is  a 
real  number  and  Ac  is  a  given  vector  (which  is  often  all  zeros  except  for  a  one  in  a 
single  entry,  but  we  don’t  need  to  restrict  the  discussion  to  this  case).  It  is  easy  to 
see  that  zff  gets  incremented  by 

tAzj\r, 


where 

(7.1) 


A zjsf  =  ( B  1N)t Acj3  -  A cjsf. 
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Hence,  the  current  basis  will  remain  dual  feasible  as  long  as 
(7.2)  Ztf  H-  tAz^f  >  0. 

We’ve  manipulated  this  type  of  inequality  many  times  before,  and  so  it  should  be 
clear  that,  for  t  >  0,  this  inequality  will  remain  valid  as  long  as 

l 

I  A  za 

t  <  max - - 

~  \jz*r  z* 

Similar  manipulations  show  that,  for  t  <  0,  the  lower  bound  is 

-l 

i  •  A  Za 

t  >  mm - - 

* 


Combining  these  two  inequalities,  we  see  that  t  must  lie  in  the  interval 


Azj 


mm  — 


* 


-1  /  \  -1 
.  Azj 

<  t  <  max  — 


* 


Z] 


Let  us  illustrate  these  calculations  with  an  example.  Consider  the  following 
linear  programming  problem: 


maximize  5xi  +  4^2  +  3x3 

subject  to  2xi  +  3x2  +  x3  <  5 

4xi  +  X2  +  2x3  <  11 

3xi  4x2  4~  2x3  ^  8 

Xi,  x2,  x3  >  0  . 

The  optimal  dictionary  for  this  problem  is  given  by 

£  =  13  —  3x2  —  X4  —  xq 
x3  =  1  +  x2  T  3x4  —  2x6 
X\—  2  —  2x2  —  2X4  +  Xq 
Xq  =  1  T-  5x2  +  2X4  . 


The  optimal  basis  is  23  =  {3,  1,  5}.  Suppose  we  want  to  know  how  much  the  coef¬ 
ficient  of  5  on  x\  in  the  objective  function  can  change  without  altering  the  optimality 
of  this  basis.  From  the  statement  of  the  problem,  we  see  that 


c  = 


5  4  3 


0  0 


Since  we  are  interested  in  changes  in  the  first  coefficient,  we  put 


Ac  = 


1  0 


0  0  0 


We  partition  c  according  to  the  final  (optimal)  basis.  Hence,  we  have 


0 

0 

Acs  = 

1 

0 

and 

Ac/v  = 

0 

0 
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Next,  we  need  to  compute  A using  (7.1).  We  could  compute  B~x N  from  scratch, 
but  it  is  easier  to  extract  it  from  the  constraint  coefficients  in  the  final  dictionary. 
Indeed, 


—B~1N 


Hence,  from  (7.1)  we  see  that 


1  3  -2 

-2  -2  1 
5  2  0 


Az^f  = 


2 

2 

-1 


Now,  (7.2)  gives  the  condition  on  t.  Writing  it  out  componentwise,  we  get 


3  +  2t  >  0,  1  +  2t  >  0,  and  1  -  t  >  0. 


These  three  inequalities,  which  must  all  hold,  can  be  summarized  by  saying  that 


1 

2 


<t<  1 


Hence,  in  terms  of  the  coefficient  on  xi,  we  finally  see  that  it  can  range  from  4.5 
to  6. 

Now  suppose  we  change  b  to  b  +  tAb  and  ask  how  much  t  can  change  before 
the  current  basis  becomes  nonoptimal.  In  this  case,  does  not  change,  but  x%  gets 
incremented  by  tAxs,  where 


Axb  =  B  1 A  b. 


Hence,  the  current  basis  will  remain  optimal  as  long  as  t  lies  in  the  interval 


f  min  — 
\iei3 


-l 

<  t  < 


f  max  — 
V  ieB 


2.  Parametric  Analysis  and  the  Homotopy  Method 

In  this  section,  we  illustrate  the  notion  of  parametric  analysis  by  applying  a 
technique  called  the  homotopy  method  to  get  a  new  algorithm  for  solving  linear 
programming  problems.  The  homotopy  method  is  a  general  technique  in  which 
one  creates  a  continuous  deformation  that  changes  a  given  difficult  problem  into 
a  related  but  trivially  solved  problem  and  then  attempts  to  work  backwards  from 
the  trivial  problem  to  the  difficult  problem  by  solving  (hopefully  without  too  much 
effort)  all  the  problems  in  between.  Of  course,  there  is  a  continuum  of  problems 
between  the  hard  one  and  the  trivial  one,  and  so  we  shouldn’t  expect  that  this  tech¬ 
nique  will  be  effective  in  every  situation;  but  for  linear  programming  and  for  many 
other  problem  domains,  it  turns  out  to  yield  efficient  algorithms. 
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We  start  with  an  example.  Suppose  we  wish  to  solve  the  following  linear  pro¬ 
gramming  problem: 


maximize 

-2xi  +  3^2 

subject  to 

~X\  +  X2 

<  - 

-1 

—X\  —  2x2 

<  - 

-2 

X2 

< 

1 

Xi,  X2 

> 

0 

The  starting  dictionary  is 

c= 

to 

I—1 

3)x2 

X3  =  ~ 

1 

+  Xi  — 

x2 

£4  =  — 

2 

+  Xi  + 

2x2 

X5  = 

1 

— 

x2 

This  dictionary  is  neither  primal  nor  dual  feasible.  Let’s  perturb  it  by  adding  a 
positive  real  number  (i  to  each  right-hand  side  and  subtracting  it  from  each  objective 
function  coefficient.  We  now  arrive  at  a  family  of  dictionaries,  parametrized  by  fi\ 


c 


(2  +  fi)x\  —  ( — 3  -j-  n)x2 


(7.3) 


x3  = 

X4  = 
X5  = 


—  1  +  /i  + 
—2  T -  (i  -\- 
1  -\-  fi 


X\  — 
Xi  + 


x2 

2X2 

X2 


Clearly,  for  /i  sufficiently  large,  specifically  fi  >  3,  this  dictionary  is  both  primal 
and  dual  feasible.  Hence,  the  associated  solution  x  —  [0, 0,  —  1  +  /i,  —  2  +  /i,  1  -j-  /i] 
is  optimal.  Starting  with  fi  large,  we  reduce  it  as  much  as  we  can  while  keeping 
dictionary  (7.3)  optimal.  This  dictionary  will  become  nonoptimal  as  soon  as  fi  <  3, 
since  the  associated  dual  variable  y\  =  —  3  -j-  fi  will  become  negative.  In  other 
words,  the  coefficient  of  x2,  which  is  3  —  /i,  will  become  positive.  This  change  of 
sign  on  the  coefficient  of  x2  suggests  that  we  make  a  primal  pivot  in  which  x2  enters 
the  basis.  The  usual  ratio  test  (using  the  specific  value  of  \±  =  3)  indicates  that  x3 
must  be  the  leaving  variable.  Making  the  pivot,  we  get 

C  =  — 3  -j-  4 fi  —  /j2  —  (—1  -j-  2fi)xi  —  (3  —  /x)x3 


X2  —  — 1  +  /i  + 

X\  — 

x3 

X4  — 4  +  3  fx  + 

T— 1 

CO 

2x3 

x  5  2 

X\  + 

x3. 

This  dictionary  is  optimal  as  long  as 

-1  +  2 11  >  0, 

CO 

IV 

0 

1  -f-  /i  >  0, 


4  +  3 fi>  0. 


These  inequalities  reduce  to 


4 

-  <  a<  3. 
3  ' 


So  now  we  can  reduce  fi  from  its  current  value  of  3  down  to  4/3.  If  we  reduce  it 
below  4/3,  the  primal  feasibility  inequality,  —4  +  3 fi  >  0,  becomes  violated.  This 
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violation  suggests  that  we  perform  a  dual  pivot  with  X4  serving  as  the  leaving  vari¬ 
able.  The  usual  (dual)  ratio  test  (with  /1  =  4/3)  then  tells  us  that  x\  must  be  the 
entering  variable.  Doing  the  pivot,  we  get 


c  —  —  /  +  4/4  +  d2  —  ( — 4  +  \  !i)x  4  —  (X  +  4/i)^3 


3  1  3 


X2  = 
X\  = 


1 

3 

4 
3 
2 


+ 

+ 


d 

X§  =  |  +  /i  — 

Now  the  conditions  for  optimality  are 

1  2 

- 1 — 11  >  0. 

3  3 


which  reduce  to 


-  —  fi  >  0. 

3  ^  ~ 


1 


—  <  Ll  <  — 

2  “  ^  “  3 


7^X4  — 
7^X4  + 

4x4  + 


7  1 

- 1 - 11  >  0. 

3  3 

2 

-  +  /i  >  0. 


4 


5x3 

1*3 

4^3 


For  the  next  iteration,  we  reduce  /i  to  1/2  and  see  that  the  inequality  that  becomes 
binding  is  the  dual  feasibility  inequality 

1  2 

- 1 — 11  >  0. 

3  ~ 

Hence,  we  do  a  primal  pivot  with  X4  entering  the  basis.  The  leaving  variable  is  x$, 
and  the  new  dictionary  is 


c  =  -i 


fi2  -  (1  -  2 /i)x5  -  (2  +  fi)x3 


x2  =  1  +  fi  -  x5 

Xi  =  2  -  x5  +  x3 

X4  =  2  +  3  (i  —  3^5  +  x3. 

For  this  dictionary,  the  range  of  optimality  is  given  by 

1  —  2/x  >  0,  2  +  /i  >  0, 

l  +  /i>0,  2  +  3/i  >  0, 

which  reduces  to 

2  1 

—  <u<-. 

3  “  2 

This  range  covers  fi  =  0,  and  so  now  we  can  set  /i  to  0  and  get  an  optimal  dictionary 
for  our  original  problem: 

C  =  -1  -  +5  -  2x3 

X2  =  1  -  x5 

X!  =  2  —  x5  +  x3 

X4  =  2  —  3x5  +  X3. 

The  algorithm  we  have  just  illustrated  is  called  the  parametric  self-dual  simplex 
method}  We  shall  often  refer  to  it  more  simply  as  the  self-dual  simplex  method.  It 
has  some  attractive  features.  First,  in  contrast  to  the  methods  presented  earlier,  this 


1 


In  the  first  edition,  this  method  was  called  the  primal-dual  simplex  method. 
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algorithm  does  not  require  a  separate  Phase  I  procedure.  It  starts  with  any  problem, 
be  it  primal  infeasible,  dual  infeasible,  or  both,  and  it  systematically  performs  pivots 
(whether  primal  or  dual)  until  it  finds  an  optimal  solution. 

A  second  feature  is  that  a  trivial  modification  of  the  algorithm  can  avoid  en¬ 
tirely  ever  encountering  a  degenerate  dictionary.  Indeed,  suppose  that,  instead  of 
adding/subtracting  fi  from  each  of  the  right-hand  sides  and  objective  coefficients, 
we  add/subtract  a  positive  constant  times  /x.  Suppose  further  that  the  positive  con¬ 
stant  is  different  in  each  addition/subtraction.  In  fact,  suppose  that  they  are  chosen 
independently  from,  say,  a  uniform  distribution  on  [1/2,  3/2] .  Then  with  probability 
one,  the  algorithm  will  produce  no  primal  degenerate  or  dual  degenerate  dictionary 
in  any  iteration.  In  Chapter  3,  we  discussed  perturbing  the  right-hand  side  of  a  linear 
programming  problem  to  avoid  degeneracy  in  the  primal  simplex  method,  but  back 
then  the  perturbation  changed  the  problem.  The  present  perturbation  does  not  in  any 
way  affect  the  problem  that  is  solved. 

With  the  above  randomization  trick  to  resolve  the  degeneracy  issue,  the  analy¬ 
sis  of  the  convergence  of  the  algorithm  is  straightforward.  Indeed,  let  us  consider  a 
problem  that  is  feasible  and  bounded  (the  questions  regarding  feasibility  and  bound¬ 
edness  are  addressed  in  Exercise  7.10).  For  each  nondegenerate  pivot,  the  next  value 
of  fi  will  be  strictly  less  than  the  current  value.  Since  each  of  these  fi  values  is  de¬ 
termined  by  a  partition  of  the  variables  into  basics  and  nonbasics  and  there  are  only 
a  finite  number  of  such  partitions,  it  follows  that  the  method  must  reach  a  partition 
with  a  negative  fi  value  in  a  finite  number  of  steps. 

3.  The  Parametric  Self-Dual  Simplex  Method 

In  the  previous  section,  we  illustrated  on  an  example  a  new  algorithm  for  solv¬ 
ing  linear  programming  problems,  called  the  parametric  self-dual  simplex  method. 
In  this  section,  we  shall  lay  out  the  algorithm  in  matrix  notation. 

Our  starting  point  is  an  initial  dictionary  as  written  in  (6.10)  and  transcribed 
here  for  convenience: 

C  =  C*  -  Zj/x* 

xB  =  x*B  - 

where 

x*B  =  B~lb 

zlf  =  (B~1N)tcb-cm 

C  =  cbxb  =  cBB~lb. 

Generally  speaking,  we  don’t  expect  this  dictionary  to  be  either  primal  or  dual  fea¬ 
sible.  So  we  perturb  it  by  adding  essentially  arbitrary  perturbations  x&  and  z^f  to 
x*B  and  Ztf,  respectively: 

C  =  C*  —  (ZX r  +  )T  xm 

x&  =  (XB  +  l^x B )  -  B~1Nx 'tf. 

We  assume  that  the  perturbations  are  all  strictly  positive, 

xb  >  0  and 


zm  >  0, 
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so  that  by  taking  /i  sufficiently  large  the  perturbed  dictionary  will  be  optimal.  (Ac¬ 
tually,  to  guarantee  optimality  for  large  /i,  we  only  need  to  perturb  those  primal  and 
dual  variables  that  are  negative  in  the  initial  dictionary.) 

The  parametric  self-dual  simplex  method  generates  a  sequence  of  dictionaries 
having  the  same  form  as  the  initial  one — except,  of  course,  the  basis  B  will  change, 
and  hence  all  the  data  vectors  (zj^,  Zjv,  xB,  and  xB)  will  change  too.  Additionally, 
the  current  value  of  the  objective  function  will,  with  the  exception  of  the  first 
dictionary,  depend  on  fi. 

One  step  of  the  self-dual  simplex  method  can  be  described  as  follows.  First,  we 
compute  the  smallest  value  of  fi  for  which  the  current  dictionary  is  optimal.  Letting 
fi*  denote  this  value,  we  see  that 

fi *  =  min{/i  :  z^  +  fizj \f  >  0  and  xB  -f  fixB  >  0}. 

There  is  either  a  j  G  AT  for  which  z*  +  fi*Zj  =  0  or  an  i  G  B  for  which  x*  +  fi*Xi  = 
0  (if  there  are  multiple  choices,  an  arbitrary  selection  is  made).  If  the  blocking 
constraint  corresponds  to  a  nonbasic  index  3  G  A/”,  then  we  do  one  step  of  the 
primal  simplex  method.  If,  on  the  other  hand,  it  corresponds  to  a  basic  index  i  G  B, 
then  we  do  one  step  of  the  dual  simplex  method. 

Suppose,  for  definiteness,  that  the  blocking  constraint  corresponds  to  an  index 
j  G  A f.  Then,  to  do  a  primal  pivot,  we  declare  x3  to  be  the  entering  variable,  and 
we  compute  the  step  direction  for  the  primal  basic  variables  as  the  jth  column  of 
the  dictionary.  That  is, 

AxB  =  B~lNej. 

Using  this  step  direction,  we  find  an  index  i  <E  B  that  achieves  the  maximal  value  of 
Axi/(x*  +  fi*Xi).  Variable  Xi  is  the  leaving  variable.  After  figuring  out  the  leaving 
variable,  the  step  direction  vector  for  the  dual  nonbasic  variables  is  just  the  negative 
of  the  Ah  row  of  the  dictionary 

A  zu  =  -(B-'Nfe,. 


After  computing  the  primal  and  dual  step  directions,  it  is  easy  to  see  that  the  step 
length  adjustments  are  given  by 


t  = 


A  Xi’ 

GL 


t  = 


X - 


A  Xi  ’ 


Z3 

A  Zj 


And  from  these,  it  is  easy  to  write  down  the  new  solution  vectors: 


Xa  i Xj  i —  Za  i —  S,  Z{  i —  <S, 


* 


3 


xB^xB  —  tAx&,  xB  xB  —  iAxB, 

zXr  zXr  -  sAz^f,  z^f  <-  z^f  -  sAz^f. 

Finally,  the  basis  is  updated  by  adding  the  entering  variable  and  removing  the  leav¬ 
ing  variable 


B  S  \  jy}  U  {j}. 
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Compute  fi*  =  max 
While  (/i*  >  0)  { 


max 
jeAT 
Zj>  0 


ZJ 

Zj 


•) 


max 

ieB 

Xi>  0 


rp  ^ 

^ i 


Xi 


If  max  is  achieved  by 

j  e  Af  ■■ 

A  x&  =  B~1Nej 


pick  i  G  argmaxieB  "V 

1  A)t 


A^v  =  -(S' 


2 

e* 


i  <E  B  : 


Azu  =  —(B~1N)t 
pick  j  G  argmaxieJV- 

A®b  =  B~1Nej 


Zi 

Azj 

z*  +  n*Zj 


Arc,; 

S  =  -A 

AZj 

X*  G-  t 

z* 


A Xi 
zj 

s  = 

AZj 

X  j  i —  / 

Zi  ^  s 


Xg^Xg  —  tAxg  xjs  <—  X]$  —  /Ax# 

z]\f  ■U-  Zjj  —  sAzj\f  zj\f  4—  zjsf  —  sAzj\f 


B  ^  B\  {/}  U  {j} 
Recompute  fi*  as  above 

} 


Figure  7.1.  The  parametric  self-dual  simplex  method. 


The  algorithm  is  summarized  in  Figure  7.1. 


Exercises 

In  solving  the  following  problems,  the  advanced  pivot  tool  can  be  used  to  check 
your  arithmetic: 

www.princeton.edu/~rvdb/JAVA/pivot/advanced.html 
7.1  The  final  dictionary  for 

maximize  x\  +  2^2  +  £3  +  X4 
subject  to  2xi  +  X2  +  5x3  +  X4  <  8 
2xi  -b  2x2  H-  4x4  ^  12 

3xi  +  X2  +  2x3  18 

Xi,  X2,  X3,  X4  >  0 
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is 

C  =  12.4  —  1.2;ri  —  0.2x5  —  0.9x6  —  2.8x4 

X2  =  6  —  Xi  —  0.5x6  —  2x4 

X3  =  0.4  —  0.2xi  —  0.2x5  +  0.1x6  +  0.2x4 

X7  =  11.2  —  1.6xi  +  0.4x5  +  0.3x6  -f  1.6x4  . 

(the  last  three  variables  are  the  slack  variables). 

(a)  What  will  be  an  optimal  solution  to  the  problem  if  the  objective  func¬ 
tion  is  changed  to 

3xi  H-  2x2  H-  2? 3  -f-  X4? 

(b)  What  will  be  an  optimal  solution  to  the  problem  if  the  objective  func¬ 
tion  is  changed  to 

Xi  H-  2x2  H-  0.6x3  X4? 

(c)  What  will  be  an  optimal  solution  to  the  problem  if  the  second  con¬ 
straint’s  right-hand  side  is  changed  to  26? 

7.2  For  each  of  the  objective  coefficients  in  the  problem  in  Exercise  7.1,  find 
the  range  of  values  for  which  the  final  dictionary  will  remain  optimal. 

7.3  Consider  the  following  dictionary  which  arises  in  solving  a  problem  using 
the  self-dual  simplex  method: 

C  =  —3  —  (—1  +  2/t)xi  —  (3  —  /jb)x  3 


x2  = 

—  1  +  /i  - E 

X‘i  — 

£3 

X4  = 

—4  +  3/i  + 

3xi  — 

2x3 

x5  = 

2  + 

xi  H- 

x3. 

(a)  For  which  values  of  / 1  is  the  current  dictionary  optimal? 

(b)  For  the  next  pivot  in  the  self-dual  simplex  method,  identify  the  enter¬ 
ing  and  the  leaving  variable. 

7.4  Solve  the  linear  program  given  in  Exercise  2.3  using  the  self-dual  simplex 
method.  Hint:  It  is  easier  to  use  dictionary  notation  than  matrix  notation. 

7.5  Solve  the  linear  program  given  in  Exercise  2.4  using  the  self-dual  simplex 
method.  Hint:  It  is  easier  to  use  dictionary  notation  than  matrix  notation. 

7.6  Solve  the  linear  program  given  in  Exercise  2.6  using  the  self-dual  simplex 
method.  Hint:  It  is  easier  to  use  dictionary  notation  than  matrix  notation. 

7.7  Using  today’s  date  (MMYY)  for  the  seed  value,  solve  ten  problems  using 
the  self-dual  simplex  method: 

www.princeton.edu/^rvdb/JAVA/pivot/pd  lphase.html 


NOTES 


109 


7.8  Use  the  self-dual  simplex  method  to  solve  the  following  problem: 

maximize  3xi  —  x 2 
subject  to  x\  —  X2  <  1 

-Xi  +  x2  <  -4 

£1,  #2  >  0  . 

7.9  Let  denote  the  perturbed  primal  problem  (with  perturbation  p).  Show 
that  if  is  infeasible,  then  is  infeasible  for  every  p!  <  p.  State  and 
prove  an  analogous  result  for  the  perturbed  dual  problem. 

7.10  Using  the  notation  of  Figure  7.1  state  precise  conditions  for  detecting  in¬ 
feasibility  and/or  unboundedness  in  the  self-dual  simplex  method. 


7.11  Consider  the  following  one  parameter  family  of  linear  programming  prob¬ 
lems  (parametrized  by  p): 

max  (4  —  4/i)xo  —  2x\  —  2x2  ~  2x%  —  2^4 
s.t.  Xq  —  X\  <1 

xo  ~  x2  <2 

xo  -  x3  <4 

Xo  —  X‘4  <  8 

Xo ,  Xi,  X2,  X3,  X4  >  0. 

Starting  from  fi  =  00,  use  the  parametric  simplex  method  to  decrease  fi 
as  far  as  possible.  Don’t  stop  at  fi  =  0.  If  you  cannot  get  to  fi  =  —00, 
explain  why.  Hint:  the  pivots  are  straight  forward  and,  after  the  first  cou¬ 
ple,  a  clear  pattern  should  emerge  which  will  make  the  subsequent  pivots 
easy.  Clearly  indicate  the  range  of  p  values  for  which  each  dictionary  is 
optimal. 


Notes 

Parametric  analysis  has  its  roots  in  Gass  and  Saaty  (1955).  G.B.  Dantzig’s  clas¬ 
sic  book  (Dantzig  1963)  describes  the  self-dual  simplex  method  under  the  name  of 
the  self-dual  parametric  simplex  method.  It  is  a  special  case  of  “Lemke’s  algorithm” 
for  the  linear  complementarity  problem  (Lemke  1965)  (see  Exercise  18.7).  Smale 
(1983)  and  Borgwardt  (1982)  were  first  to  realize  that  the  parametric  self-dual  sim¬ 
plex  method  is  amenable  to  probabilistic  analysis.  For  a  more  recent  discussion 
of  homotopy  methods  and  the  parametric  self-dual  simplex  method,  see  Nazareth 
(1986,  1987). 


CHAPTER  8 


Implementation  Issues 


In  the  previous  chapter,  we  rewrote  the  simplex  method  using  matrix  notation. 
This  is  the  first  step  toward  our  aim  of  describing  the  simplex  method  as  one  would 
implement  it  as  a  computer  program.  In  this  chapter,  we  shall  continue  in  this  direc¬ 
tion  by  addressing  some  important  implementation  issues. 

The  most  time-consuming  steps  in  the  simplex  method  are  the  computations 

A  xb  =  B~1Nej  and  A  zj\f  =  —  (E>_17V)Te^ 

and  the  difficulty  in  these  steps  arises  from  the  B~x .  Of  course,  we  don’t  ever 
actually  compute  the  inverse  of  the  basis  matrix.  Instead,  we  calculate,  say,  A xjs  by 
solving  the  following  system  of  equations: 

(8.1)  BAxjs  =  aj, 

where 

aj  =  Nei 

is  the  column  of  N  associated  with  nonbasic  variable  Xj . 

Similarly,  the  calculation  of  A zj^  is  also  broken  into  two  steps: 

(8.2)  Btv  =  ei, 

A  zj\f  =  —Ntv. 

Here,  the  first  step  is  the  solution  of  a  large  system  of  equations,  this  time  involving 
Bt  instead  of  B ,  and  the  second  step  is  the  comparatively  trivial  task  of  multiplying 
a  vector  on  the  left  by  the  matrix  —NT. 

Solving  the  systems  of  equations  (8.1)  and  (8.2)  is  where  most  of  the  com¬ 
plexity  of  a  simplex  iteration  lies.  We  discuss  solving  such  systems  in  the  first  two 
sections.  In  the  second  section,  we  look  at  the  effect  of  sparsity  on  these  systems. 
The  next  few  sections  explain  how  to  reuse  and/or  update  the  computations  of  one 
iteration  in  subsequent  iterations.  In  the  final  sections,  we  address  a  few  other  issues 
that  affect  the  efficiency  of  an  implementation. 

1.  Solving  Systems  of  Equations:  LU -Factorization 

In  this  section,  we  discuss  solving  systems  of  equations  of  the  form 

Bx  =  6, 

where  B  is  an  invertible  m  x  m  matrix  and  b  is  an  arbitrary  m-vector.  (Analysis  of 
the  transpose  BTx  =  b  is  left  to  Exercise  8.4.)  Our  first  thought  is  to  use  Gaussian 
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elimination.  This  idea  is  correct,  but  to  explain  how  Gaussian  elimination  is  actually 
implemented,  we  need  to  take  a  fresh  look  at  how  it  works.  To  explain,  let  us 
consider  an  example: 


B 


2  4-2 

3  1  1 

-1  -1  -2 

-1  -6 

1  4 


(Note  that,  to  emphasize  the  importance  of  sparsity,  zero  entries  are  simply  left 
blank.)  In  Gaussian  elimination,  one  begins  by  subtracting  appropriate  multiples 
of  the  first  row  from  each  subsequent  row  to  get  zeros  in  the  first  column  below 
the  diagonal.  For  our  specific  example,  we  subtract  3/2  times  the  first  row  from 
the  second  row  and  we  subtract  —1/2  times  the  first  row  from  the  third  row.  The 
result  is 


2  4-2 

1-613 
1  -3  . 

-1  -6 

1  4 

Shortly,  we  will  want  to  remember  the  values  of  the  nonzero  elements  in  the  first 

column.  Therefore,  let  us  agree  to  do  the  row  operations  that  are  required  to  elimi¬ 
nate  nonzeros,  but  when  we  write  down  the  result  of  the  elimination,  we  will  leave 
the  nonzeros  there.  With  this  convention,  the  result  of  the  elimination  of  the  first 
column  can  be  written  as 


2  4-2 

31  1  -6  I  3 

-1  1  -3 

-1  -6 

1  4 


Note  that  we  have  drawn  a  line  to  separate  the  eliminated  top/left  parts  of  the  matrix 
from  the  uneliminated  lower-right  part. 

Next,  we  eliminate  the  nonzeros  below  the  second  diagonal  (there’s  only  one) 
by  subtracting  an  appropriate  multiple  of  the  second  row  from  each  subsequent  row. 
Again,  we  write  the  answer  without  zeroing  out  the  eliminated  elements: 

4  -2" 

1-613 
~  ^3"  . 

-6  1  -3 

1  4 


2 

3 

1 


1 


1.  SOLVING  SYSTEMS  OF  EQUATIONS:  LU-FACTORIZATION 


113 


After  eliminating  the  third  column,  we  get 


2  4-2 

3  1-61  3 

-1  1  -3 

-1  -61  1  -21 
1  7 


Now,  the  remaining  uneliminated  part  is  already  an  upper  triangular  matrix,  and 
hence  no  more  elimination  is  required. 

At  this  point,  you  are  probably  wondering  how  this  strangely  produced  matrix 
is  related  to  the  original  matrix  B.  The  answer  is  both  simple  and  elegant.  First,  take 
the  final  matrix  and  split  it  into  three  matrices:  the  matrix  consisting  of  all  elements 
on  or  below  the  diagonal,  the  matrix  consisting  of  just  the  diagonal  elements,  and 
the  matrix  consisting  of  all  elements  on  or  above  the  diagonal.  It  is  amazing  but 
true  that  B  is  simply  the  product  of  the  resulting  lower  triangular  matrix  times  the 
inverse  of  the  diagonal  matrix  times  the  upper  triangular  matrix: 


2 

2 

-l 

i 

to 

to 

_ 1 

3  1 

1 

1-6  1  3 

-1  1 

1 

1  -3 

-1  -6  1 

1 

1  -21 

1  7 

7 

7 

(If  you  don’t  believe  it,  multiply  them  and  see.)  Normally,  the  product  of  the  lower 
triangular  matrix  and  the  diagonal  matrix  is  denoted  by  L, 


L 


2 

2 

-l 

1 

3  1 

1 

-  1 

2 

-1  1 

1 

— 

--  1 

2 

-1  -6  1 

1 

-1  -6  1 

1  7 

7 

1  1 

and  the  upper  triangular  matrix  is  denote  by  U : 


2  4-2 

1-6  1  3 

1  -3 

1  -21 
7 


The  resulting  representation, 


B  =  LU , 


is  called  an  LU -factorization  of  B.  Finding  an  L  [/-factorization  is  equivalent  to 
Gaussian  elimination  in  the  sense  that  multiplying  B  on  the  left  by  L-1  has  the 
effect  of  applying  row  operations  to  B  to  put  it  into  upper- triangular  form  U. 
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The  value  of  an  L  (7-factorization  is  that  it  can  be  used  to  solve  systems  of 
equations.  For  example,  suppose  that  we  wish  to  solve  equation  (8.1),  where  B  is 
as  above  and 


(8.3) 


7 

-2 

0 

3 

0 


First,  we  substitute  LU  for  B  so  that  the  system  becomes 


LU  A  xb  =  cij- 

Now,  if  we  let  y  =  U  Ax&,  then  we  can  solve 


Ly  =  b 


for  y ,  and  once  y  is  known,  we  can  solve 

UAxb  =  y 


for  Ax&.  Because  L  is  lower  triangular,  solving  Ly  =  b  is  easy.  Indeed,  writing  the 
system  out, 


1 

yi  ’ 

7  " 

1  1 

V2 

-2 

-I  1 

ys 

— 

0 

-1  -6  1 

2/4 

3 

1  1 

.  _ 

0 

we  notice  immediately  that  y i  =  7.  Then,  given  yi,  it  becomes  clear  from  the 
second  equation  that  y2  =  —  2  —  (3/ 2)2/1  =  —25/2.  Continuing  in  this  way,  we  find 
that 


7 

y\ 

25 

y2 

2 

7 

ys 

— 

2 

2/4 

23 

2 

.  y$  _ 

7 

L  2  3 

The  process  of  successively  solving  for  the  elements  of  the  vector  y  starting  with 
the  first  and  proceeding  to  the  last  is  called  forward  substitution. 

Of  course,  solving  U Ax&  —  y  is  easy  too,  since  U  is  upper  triangular.  The 
system  to  solve  is  given  by 


2  4-2 

1-6  1  3 

1  -3 

1  -21 
7 


Axi 

Ax2 

Ax3 

1 

^  JfcO  ^ 

_ 1 

1 

>  > 

~  ~ 

Or  ^ 

23 

2 

7 

L  2-1 
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(note  that,  to  keep  notations  simple,  we  are  assuming  that  the  basic  indices  are  1 
through  5  so  that  Ax&  =  (A#i,  A#2,  Ax 3,  A^4,  Axfj).  This  time  we  start  with 
the  last  equation  and  see  that  Ax$  =  —1/2.  Then  the  second  to  last  equation  tells 
us  that  AX4  =  23/2  +  21(Ax5)  =  1.  After  working  our  way  to  the  first  equation, 
we  have 


Axb 


T— 1 

< 

1 _ 

1 

T— 1 

1 _ 

Ax2 

0 

Ax3 

— 

to 

<1 

1 

LO 

<1 
_ 1 

1 

2  _ 

This  process  of  working  from  the  last  element  of  Ax&  back  to  the  first  is  called 
backward  substitution. 


2.  Exploiting  Sparsity 

In  the  previous  section,  we  took  a  specific  matrix  B  and  constructed  an  LU 
factorization  of  it.  However,  with  that  example  we  were  lucky  in  that  every  diagonal 
element  was  nonzero  at  the  moment  it  was  used  to  eliminate  the  nonzeros  below 
it.  Had  we  encountered  a  zero  diagonal  element,  we  would  have  been  forced  to 
rearrange  the  columns  and/or  the  rows  of  the  matrix  to  put  a  nonzero  element  in  this 
position.  For  a  random  matrix  (whatever  that  means),  the  odds  of  encountering  a 
zero  are  nil,  but  a  basis  matrix  can  be  expected  to  have  plenty  of  zeros  in  it,  since, 
for  example,  it  is  likely  to  contain  columns  associated  with  slack  variables,  which 
are  all  zero  except  for  one  1.  A  matrix  that  contains  zeros  is  called  a  sparse  matrix. 

When  a  sparse  matrix  has  lots  of  zeros,  two  things  happen.  First,  the  chances  of 
being  required  to  make  row  and/or  column  permutations  is  high.  Second,  additional 
computational  efficiency  can  be  obtained  by  making  further  row  and/or  column  per¬ 
mutations  with  the  aim  of  keeping  L  and/or  U  as  sparse  as  possible. 

The  problem  of  finding  the  “best”  permutation  is,  in  itself,  harder  than  the  lin¬ 
ear  programming  problem  that  we  ultimately  wish  to  solve.  But  there  are  simple 
heuristics  that  help  to  preserve  sparsity  in  L  and  U.  We  shall  focus  on  just  one 
such  heuristic,  called  the  minimum-degree  ordering  heuristic,  which  is  describe  as 
follows: 


Before  eliminating  the  nonzeros  below  a  diagonal  “pivot”  el¬ 
ement,  scan  all  uneliminated  rows  and  select  the  sparsest  row, 
i.e.,  that  row  having  the  fewest  nonzeros  in  its  uneliminated  part 
(ties  can  be  broken  arbitrarily).  Swap  this  row  with  the  pivot 
row.  Then  scan  the  uneliminated  nonzeros  in  this  row  and  select 
that  one  whose  column  has  the  fewest  nonzeros  in  its  unelim¬ 
inated  part.  Swap  this  column  with  the  pivot  column  so  that 
this  nonzero  becomes  the  pivot  element.  ( Of  course,  provisions 
should  be  made  to  reject  such  a  pivot  element  if  its  value  is  close 
to  zero.) 
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As  a  matter  of  terminology,  the  number  of  nonzeros  in  the  uneliminated  part  of 
a  row/column  is  called  the  degree  of  the  row/column.  Hence,  the  name  of  the 
heuristic. 

Let’s  apply  the  minimum-degree  heuristic  to  the  LU -factorization  of  the  matrix 
B  studied  in  the  previous  section.  To  keep  track  of  the  row  and  column  permuta¬ 
tions,  we  will  indicate  original  row  indices  on  the  left  and  original  column  indices 
across  the  top.  Hence,  we  start  with: 


B 


1 

2 

3 

4 

5 


1  2  3  4  5 

2  4-2 

3  1  1 

-1  -1  -2 

-1  -6 

1  4 


To  begin,  row  4  has  the  fewest  nonzeros,  and  within  row  4,  the  —1  in  column  2 
belongs  to  the  column  with  the  fewest  nonzeros.  Hence,  we  swap  rows  1  and  4  and 
we  swap  columns  1  and  2  to  rewrite  B  as 


B 


4 
2 
3 
1 

5 


2  13  4  5 

-1  -6 

1  3  1 

-1  -1  -2 

2  4  -2 

1  4 


Now,  we  eliminate  the  nonzeros  under  the  first  diagonal  element  (and,  as  before,  we 
leave  the  eliminated  nonzeros  as  they  were).  The  result  is 


4 

2 

-1 

1 

CO 

4 

5 

-6" 

2 

1 

00 

1 

-6 

00 

-1 

-1 

-2 

1 

2 

4 

-2 

5 

1 

i 

Before  doing  the  elimination  associated  with  the  second  diagonal  element,  we 
note  that  row  5  is  the  row  with  minimum  degree,  and  within  row  5,  the  element  1  in 
column  3  has  minimum  column  degree.  Hence,  we  swap  rows  2  and  5  and  we  swap 
columns  1  and  3  to  get 

2  3  14  5 


4 

-1 

—6 

5 

1  4 

oo 

-1  -1  -2 

1 

4  2  -2 

2 

1 

3  1  -6 
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Now  we  eliminate  the  nonzeros  under  the  second  diagonal  element  to  get 


4 

5 
3 

1 

2 


1 


4  5 

-6 
4 
2 

-18 
1  -6 


For  the  third  stage  of  elimination,  note  that  row  3  is  a  minimum-degree  row  and 
that,  among  the  nonzero  elements  of  that  row,  the  —  1  is  in  a  minimum-degree  col¬ 
umn.  Hence,  for  this  stage  no  permutations  are  needed.  The  result  of  the  elimination 
is 


4 

5 
3 

1 

2 


2  3 

-1 

1 

-1 

4 

1 


1 


4 


1 


For  the  next  stage  of  the  elimination,  both  of  the  remaining  two  rows  have  the 
same  degree,  and  hence  we  don’t  need  to  swap  rows.  But  we  do  need  to  swap 
columns  5  and  4  to  put  the  —14  into  the  diagonal  position.  The  result  of  the  swap  is 


4 

5 
3 

1 

2 


1 


3 

1 

-1 

4 


1  5 
—6 
4 

1  2 
2RL4 


4 


1 


At  this  point,  we  notice  that  the  remaining  2x2  uneliminated  part  of  the  matrix 
is  already  upper  triangular  (in  fact,  diagonal),  and  hence  no  more  elimination  is 
needed. 

With  the  elimination  completed,  we  can  extract  the  matrices  L  and  U  in  the 
usual  way: 


L 


4 

-1 

-1 

5 

1 

1 

CO 

-1  -1 

-1 

1 

4  2  -14 

l 

14 

2 

1  3  1 

1 

4 

5 
3 

1 

2 


1 

1 

-1  1 
4  -2  1 

-1  -3  1 
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2  3  15  4 

1  -6 

1  4 

-1  2 

-14 

1 

(Note  that  the  columns  of  L  and  the  rows  of  U  do  not  have  any  “original”  indices 
associated  with  them,  and  so  no  permutation  is  indicated  across  the  top  of  L  or  down 
the  left  side  of  U.) 

This  LU -factorization  has  five  off-diagonal  nonzeros  in  L  and  three  off-diagonal 
nonzeros  in  U  for  a  total  of  eight  off-diagonal  nonzeros.  In  contrast,  the  LU  fac¬ 
torization  from  the  previous  section  had  a  total  of  12  off-diagonal  nonzeros.  Hence, 
the  minimum-degree  ordering  heuristic  paid  off  for  this  example  by  reducing  the 
number  of  nonzeros  by  33  %.  While  such  a  reduction  may  not  seem  like  a  big  deal 
for  small  matrices  such  as  our  5x5  example,  for  large  matrices  the  difference  can 
be  dramatic. 

The  fact  that  we  have  permuted  the  rows  and  columns  to  get  this  factoriza¬ 
tion  has  only  a  small  impact  on  how  one  uses  the  factorization  to  solve  systems  of 
equations.  To  illustrate,  let  us  solve  the  same  system  that  we  considered  before: 
B Ax s  =  dj ,  where  aj  is  given  by  (8.3).  The  first  step  in  solving  this  system  is  to 
permute  the  rows  of  aj  so  that  they  agree  with  the  rows  of  L  and  then  to  use  forward 
substitution  to  solve  the  system  Ly  =  aj.  Writing  it  out,  the  system  looks  like  this: 


and 
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The  result  of  the  forward  substitution  is  that 


(8.4) 
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The  next  step  is  to  solve  the  system  U A X&  =  y.  Writing  this  system  out,  we  get 
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Using  backward  substitution,  we  see  that 
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Finally,  we  rewrite  the  solution  listing  the  elements  of  A xb  in  their  original  order: 
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Of  course,  the  answer  obtained  here  agrees  with  the  one  obtained  at  the  end  of  the 
previous  section. 

Even  with  good  fill-in  minimizing  heuristics  such  as  minimum-degree,  the  LU- 
factorization  remains  a  significant  computational  bottleneck.  To  see  why,  consider 
for  the  moment  dense  matrices.  If  we  were  to  write  a  subroutine  to  carry  out  an 
LU -factorization,  we  would  find  that  the  main  body  of  the  routine  would  have  a  big 
triply  nested  loop: 


for  each  column  index  j  { 

for  each  remaining  row  index  i  { 

for  each  remaining  column  index  k  { 

update  the  (i,k)  entry  in  accordance  with 
the  aim  to  make  the  (i,j)  entry  be  zero 

} 

} 

} 


Since  each  of  these  loops  involves  approximately  m  steps,  the  LU- factorization 
routine  requires  about  m3  operations  and  hence  is  called  an  order  m3  algorithm. 
Similar  considerations  tell  us  that  the  forward  and  backward  substitutions  are  both 
order  m2  algorithms.  This  means  that  forward  and  backward  substitution  can  be 
done  much  faster  than  L/7-factorization.  Indeed,  if  m  =  5,000,  then  factorization 
takes  a  couple  of  1 ,000  times  longer  than  a  forward  or  backward  substitution.  Of 
course,  this  argument  is  for  dense  matrices.  But  for  sparse  matrices  a  similar,  if  less 
dramatic,  effect  is  seen.  Typically,  for  sparse  matrices,  one  expects  that  factorization 
will  take  from  10  to  100  times  longer  than  substitution.  Therefore,  it  is  important  to 
perform  as  few  LU -factorizations  as  possible.  This  is  the  subject  of  the  next  section. 


3.  Reusing  a  Factorization 

In  the  previous  two  sections,  we  showed  how  to  use  an  L /7-factorization  of  B 
to  solve  the  system  of  equations 


BAxb  =  dj 
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for  the  primal  step  direction  A X&.  Since  the  basis  matrix  doesn’t  change  much  from 
one  iteration  of  the  simplex  method  to  the  next  (columns  get  replaced  by  new  ones 
one  at  a  time),  we  ask  whether  the  LU -factorization  of  B  from  the  current  iteration 
might  somehow  be  used  again  to  solve  the  systems  of  equations  that  arise  in  the  next 
iteration  (or  even  the  next  several  iterations). 

Let  B  denote  the  current  basis  (for  which  a  factorization  has  already  been  com¬ 
puted)  and  let  B  denote  the  basis  of  the  next  iteration.  Then  B  is  simply  B  with 
the  column  that  holds  the  column  vector  ai  associated  with  the  leaving  variable  Xi 
replaced  by  a  new  column  vector  a3  associated  with  the  entering  variable  x3 .  This 
verbal  description  can  be  converted  into  a  formula: 

(8.5)  B  =  B  +  ( a,j  —  di)ej . 

Here,  as  before,  ei  denotes  the  vector  that  is  all  zeros  except  for  a  one  in  the  position 
associated  with  index  i — to  be  definite,  let  us  say  that  this  position  is  the  pth  position 
in  the  vector.  To  see  why  this  formula  is  correct,  it  is  helpful  to  realize  that  a  column 
vector,  say  a,  times  ef  produces  a  matrix  that  is  all  zero  except  for  the  pth  column, 
which  contains  the  column  vector  a. 

Since  the  basis  B  is  invertible,  (8.5)  can  be  rewritten  as 

B  =  B  (/  +  B~1(aj  -  di)e[)  . 

Denote  the  matrix  in  parentheses  by  E.  Recall  that  aj  =  Nej ,  since  it  is  the  column 
vector  from  A  associated  with  the  entering  variable  x3 .  Hence, 

B~1CLj  =  B~1Nej  =  Ax#, 

which  is  a  vector  we  need  to  compute  in  the  current  iteration  anyway.  Also, 

B~xai  =  eit 


since  ai  is  the  column  of  B  associated  with  the  leaving  variable  xi.  Therefore,  we 
can  write  E  more  simply  as 

E  =  I  +  {Axb  -  ei)ej . 

Now,  if  E  has  a  simple  inverse,  then  we  can  use  it  together  with  the  LU -factorization 
of  B  to  provide  an  efficient  means  of  solving  systems  of  equations  involving  B.  The 
following  proposition  shows  that  E  does  indeed  have  a  simple  inverse. 


PROPOSITION  8.1.  Given  two  column  vectors  u  and  v  for  which  1  +  vTu  7^  0, 

(- I  +  UV =  I  -  - T=~. 

v  y  1  -I-  VTU 

PROOF.  The  proof  is  trivial.  We  simply  multiply  the  matrix  by  its  supposed 
inverse  and  check  that  we  get  the  identity: 


h+ 


uv 


T 


>('- 


UV 


T 


=  I +  uvT  — 


1  +  VTU 


uv 


T 


T  T 
UV  UV 


=  I  +  uvT  (  1  — 


1  +  VTU  1  +  VTU 

1 


T 

V  U 


1  +  VTU  1  +  V1  u 


T, 
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where  the  last  equality  follows  from  the  observation  that  the  parenthesized  expres¬ 
sion  vanishes.  □ 


The  identity  in  Proposition  8.1  may  seem  mysterious,  but  in  fact  it  has  a  simple 
derivation  based  on  the  explicit  formula  for  the  sum  of  a  geometric  series: 

OO  1 

J2^j  =  r. forl£l<L 

3  =  0  ^ 

This  is  an  identity  for  real  numbers,  but  it  also  holds  for  matrices: 

OO 

Y,xj  =  (i-xr\ 

3=  0 


provided  that  the  absolute  value  of  each  of  the  eigenvalues  of  X  is  less  than  one  (we 
don’t  prove  this  here,  since  it’s  just  for  motivation).  Assuming,  for  the  moment,  that 
the  absolute  values  of  the  eigenvalues  of  uvT  are  less  than  one  (actually,  all  but  one 
of  them  are  zero),  we  can  expand  (I  +  uvT)~l  in  a  geometric  series,  reassociate 
products,  and  collapse  the  resulting  geometric  series  to  get 


(/  +  uvT)  1  =  I 

=  I 
=  I 

=  I 
=  I 


uvT  +  ( uvT)(uvT )  —  ( uvT){uvT)(uvT )  +  •  •  • 

T1  /  'T'  \  'T'  /  T-1  \  /  'T'  \  'T' 

uv  +  u(v  u)v  —  u(v  u){v  u)v  +••• 
u  (l  —  vTu  +  (vT u)2  —  •  •  • )  vT 


1 


u 


1  +  VTU 

uvT 

1  +  VTU  ’ 


V 


T 


where  the  last  equality  follows  from  the  fact  that  1/(1  +  vTu )  is  a  scalar  and  there¬ 
fore  can  be  pulled  out  of  the  vector/matrix  calculation. 

Applying  Proposition  8.1  to  matrix  E ,  we  see  that 


E 


-l 


(Axb  -  ej)ej 
l  +  ef  ( Axb  -  e») 


_  (Axb  ~  e»)ef 
Axi 
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Aa?ji 
^.x  i 
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1 

A  x  i 

/XxiP+  i 
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1 

^ X3m 
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1  _ 
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Now,  let’s  look  at  the  systems  of  equations  that  need  to  be  solved  in  the  next 
iteration.  Using  tildes  to  denote  items  associated  with  the  next  iteration,  we  see  that 
we  need  to  solve 

B  Ax  £  =  dj  and  B  v  = 

(actually,  we  should  probably  put  the  tilde  on  the  j  instead  of  the  a3  and  on  the  i 
instead  of  the  e^,  but  doing  so  seems  less  aesthetically  appealing,  even  though  it’s 
more  correct).  Recalling  that  B  =  BE ,  we  see  that  the  first  system  is  equivalent  to 

BE  =  dj, 

which  can  be  solved  in  two  stages: 

Bu  =  dj, 

EAxts  =  u. 


Of  course,  the  second  system  (involving  E)  is  trivial,  since  we  have  an  explicit 
formula  for  the  inverse  of  E: 

Axis  =  E~xu 


u i 


u 


Axi 


(Azb  -  ej) 


(where,  in  keeping  with  our  tradition,  we  have  used  ui  to  denote  the  element  of  u 
associated  with  the  basic  variable  X{ — that  is,  U{  is  the  pth  entry  of  u). 

The  system  involving  BT  is  handled  in  the  same  manner.  Indeed,  first  we 
rewrite  it  as 

Et  Bt  v  =  e-i 

and  then  observe  that  it  too  can  be  solved  in  two  steps: 

Etu  =  e  i, 


Btv  =  u. 


This  time,  the  first  step  is  the  trivial  one1: 

U  =  E~T6i 


(Axjs  ~  ei)T&i 


Note  that  the  fraction  in  the  preceding  equation  is  a  scalar,  and  so  this  final  expres¬ 
sion  for  u  shows  that  it  is  a  vector  with  at  most  two  nonzeros — that  is,  the  result  is 
utterly  trivial  even  if  the  formula  looks  a  little  bit  cumbersome. 

We  end  this  section  by  returning  briefly  to  our  example.  Suppose  that  B  is  B 
with  column  3  replaced  by  the  vector  a3  given  in  (8.3).  Suppose  that 


To  solve  BAxb  =  dj,  we  first  solve  Bu  =  dj  using  our  L/7-factorization  of  B. 
The  result  of  the  forward  and  backward  substitutions  is 


E 


Occasionally  we  use  the  superscript 
-T  =  (£-l)T 


— T  for  the  transpose  of  the  inverse  of  a  matrix. 


Hence, 
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U  = 


0 


Next,  we  solve  for  A x&  =  E  lu\ 


A  Xj$  =  u  — 


_U3_ 

Ax3 


(AxB-e3) 


Of  course,  it  is  easy  to  check  that  we  have  gotten  the  correct  answer:  simply  multiply 
B  times  A x&  and  check  that  it  equals  a3.  It  does. 


4.  Performance  Tradeoffs 

The  idea  of  writing  the  next  basis  as  a  product  of  the  current  basis  times  an 
easily  invertible  matrix  can  be  extended  over  several  iterations.  For  example,  if  we 
look  k  iterations  out,  we  can  write 

Bk  =  BqEqEi  •  •  •  Ei c_i. 

If  we  have  an  LU -factorization  of  Bq  and  we  have  saved  enough  information  to 
reconstruct  each  Ej ,  then  we  can  use  this  product  to  solve  systems  of  equations 
involving  Bk . 

Note  that  in  order  to  reconstruct  Ej ,  all  we  need  to  save  is  the  primal  step 
direction  vector  AxJs  (and  an  integer  telling  which  column  it  goes  in).  In  actual 
implementations,  these  vectors  are  stored  in  lists.  For  historical  reasons,  this  list 
is  called  an  eta- file  (and  the  matrices  Ej  are  called  eta  matrices).  Given  the  LU- 
factorization  of  B0  and  the  eta-file,  it  is  an  easy  matter  to  solve  systems  of  equations 
involving  either  B  or  BT .  However,  as  k  gets  large,  the  amount  of  work  required 
to  go  through  the  entire  eta-file  begins  to  dominate  the  amount  of  work  that  would 
be  required  to  simply  form  a  new  LU -factorization  of  the  current  basis.  Hence,  the 
best  strategy  is  to  use  an  eta-file  but  with  periodic  refactorization  of  the  basis  (and 
accompanied  purging  of  the  eta-file). 

The  question  then  becomes:  how  often  should  one  recompute  a  factorization 
of  the  current  basis?  To  answer  this  question,  suppose  that  we  know  that  it  takes  F 
arithmetic  operations  to  form  an  LU -factorization  (of  a  typical  basis  for  the  problem 
at  hand),  S  operations  to  do  one  forward/backward  substitution,  and  E  operations 
to  multiply  by  the  inverse  of  one  eta-matrix.  Then  the  number  of  operations  for 
the  initial  iteration  of  the  simplex  method  is  F  +  2S  (since  we  need  to  do  an  LU- 
factorization  and  two  forward/backward  substitutions — one  for  the  system  involving 
the  basis  and  the  other  for  the  system  involving  its  transpose).  Then,  in  the  next 
iteration,  we  need  to  do  two  forward/backward  substitutions  and  two  eta-inverse 
calculations.  Each  subsequent  iteration  is  the  same  as  the  previous,  except  that  there 
are  two  extra  eta-inverse  calculations.  Hence,  the  average  number  of  arithmetic 
operations  per  iteration  if  we  refactorize  after  every  K  iterations  is 
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T(K) 


i  ({F  +  25)  +  2(5  +  E)  +  2(5  +  2 E) 
K 

+  •  •  •  +  2  (S  H-  (K  —  1)E)) 

^F  +  2S+(K  -l)E. 


Treating  as  a  real  variable  for  the  moment,  we  can  differentiate  this  expression 
with  respect  to  K,  set  the  derivative  equal  to  zero,  and  solve  for  K  to  get  an  estimate 
for  the  optimal  choice  of  K : 


As  should  be  clear  from  our  earlier  discussions,  E  is  of  order  m  and,  if  the  basis 
matrix  is  dense,  F  is  of  order  m3.  Hence,  for  dense  matrices,  our  estimates  would 
indicate  that  refactorizations  should  take  place  every  m  iterations  or  so.  However, 
for  sparse  matrices,  F  will  be  substantially  less  that  m3 — more  like  a  constant  times 
m2 — which  would  indicate  that  refactorizations  should  occur  on  the  order  of  every 
^[m  iterations.  In  practice,  one  typically  allows  the  value  of  K  to  be  a  user-settable 
parameter  whose  default  value  is  set  to  something  like  100. 


5.  Updating  a  Factorization 


There  is  an  important  alternative  to  the  eta- matrix  method  for  reusing  an  LU- 
factorization,  which  we  shall  describe  in  this  section  and  the  next.  As  always,  it 
is  easiest  to  work  with  an  example,  so  let’s  continue  with  the  same  example  we’ve 
been  using  throughout  this  chapter. 

Recall  that  the  matrix  B  is  simply  B  with  its  third  column  replaced  by  the 
vector  cij  given  in  (8.3): 
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(Note  that  we’ve  highlighted  the  new  column  by  putting  a  box  around  it.) 

Since  L~1B  =  U  and  B  differs  from  B  in  only  one  column,  it  follows  that 
L~lB  coincides  with  U  except  for  the  column  that  got  changed.  And,  since  this 
column  got  replaced  by  aj ,  it  follows  that  this  column  of  L~XB  contains  L~laj, 
which  we’ve  already  computed  and  found  to  be  given  by  (8.4).  Hence, 


(8.6) 


L~XB  = 


2 

3 

1  5  4 

1 

-1 

3 

-6 

2 

4 

3 

-1  2 

4 

7 

-14 

5 

1 

1 

5.  UPDATING  A  FACTORIZATION 


125 


As  we  saw  before,  the  columns  of  L  have  no  “original”  indices  to  relate  back  to,  and 
so  one  can  simply  take  them  to  be  numbered  in  natural  order.  The  same  is  then  true 
for  the  rows  of  L_1  and  hence  for  the  rows  of  L~lB.  That  is  why  the  rows  shown 
above  are  numbered  as  they  are.  We’ve  shown  these  numbers  explicitly,  since  they 
are  about  to  get  permuted. 

The  boxed  column  in  (8.6)  is  called  a  spike ,  since  it  has  nonzeros  below  the 
diagonal.  The  4x4  submatrix  constructed  from  rows  2  through  5  and  columns  3, 
1,5,  and  4  is  called  the  bump.  To  get  this  matrix  back  into  upper- triangular  form, 
one  could  do  row  operations  to  eliminate  the  nonzeros  in  the  spike  that  lie  below  the 
diagonal.  But  such  row  operations  could  create  fill-in  anywhere  in  the  bump.  Such 
fill-in  would  be  more  than  one  would  like  to  encounter.  However,  consider  what 
happens  if  the  spike  column  is  moved  to  the  rightmost  column  of  the  bump,  shifting 
the  other  columns  left  one  position  in  the  process,  and  if  the  top  row  of  the  bump 
(i.e.,  row  2)  is  moved  to  the  bottom  of  the  bump,  shifting  the  other  bump  rows  up 
by  one.  The  result  of  these  permutations  is 

2  15  4  3 


1 

-1 

-6  3" 

00 

-1  2 

4 

-14  7 

5 

1  1 

to 

4 

(For  future  reference,  we’ve  boxed  the  bump.)  In  general,  the  effect  of  this  permu¬ 
tation  is  that  the  column  spike  gets  replaced  by  a  row  spike  along  the  bottom  row  of 
the  bump: 
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Now  any  fill-in  produced  by  row  operations  is  confined  to  the  spike  row.  In  our 
example,  there  is  only  one  nonzero  in  the  spike  row,  and  to  eliminate  it  we  need  to 
add  2/7  times  row  4  to  it.  This  row  operation  can  be  represented  algebraically  as 
multiplication  on  the  left  by  the  matrix 
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That  is, 
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If  we  denote  the  new  upper  triangular  matrix  by  U,  then,  solving  for  B ,  we  get  the 
following  factorization  of  B: 


B  =  LE~lU. 


We  can  use  this  new  factorization  to  solve  systems  of  equations.  For  example, 
to  solve 

BAxg  =  dj, 

we  first  solve 

(8.7)  Ly  =  aj 

for  y.  Then,  given  y,  we  compute 

z  =  Ey, 


and  finally  we  solve 


U  Axis  =  z 


for  A xb-  It  is  clear  from  the  following  chain  of  equalities  that  these  three  steps 
compute  Axis  • 


Axb  =  U^z  =  U~1Ey  =  U~1EL~1aj  = 


For  our  example,  we  use  forward  substitution  to  solve  (8.7)  for  y.  The  result  is 
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o 

i _ 

1 

i 

o 

i _ 

2 

-1 

3 

-1 

oo 

-1 

=  4 

7 

4 

7 

5 

CO 

5 

i 

CO 

_ 1 

2 

-1 

Next,  we  apply  the  row  operations  (in  our  case,  there  is  only  one)  to  get  z: 
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Finally,  backward  substitution  using  U  is  performed  to  compute  A : 


2  15  4  3 


1 

-1  -6  3" 
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0" 

3 

-1  2 
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3 

-1 
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-14  7 
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? 

=  4 
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2 
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The  result  of  the  backward  substitution  is 


Axb  = 


2 

1 

3  " 

1 

2 

1 

2 

r  I"! 

2 

3 

5 

1 

4 

=  3 

1 

2 

4 

7 

4 

_  7 

3 

2 

1 

L  2  J 

5 

i 

CM  i-HlxF 

which  agrees  with  the  solution  we  got  before  using  eta-matrices. 


6.  Shrinking  the  Bump 


There  is  an  important  enhancement  to  the  factorization  updating  technique  de¬ 
scribed  in  the  previous  section.  After  permuting  rows  and  columns  converting  the 
spike  column  into  a  spike  row,  we  can  exploit  the  fact  that  the  spike  row  is  often 
very  sparse  (coming  as  it  does  from  what  was  originally  the  top  row  of  the  bump) 
and  do  further  row  and  column  permutations  to  reduce  the  size  of  the  bump.  To  see 
what  we  mean,  let’s  look  at  our  example.  First,  we  note  that  the  leftmost  element 
of  the  spike  row  is  zero  (and  hence  that  the  left  column  of  the  bump  is  a  singleton 
column).  Therefore,  we  can  simply  declare  that  this  column  and  the  corresponding 
top  row  do  not  belong  to  the  bump.  That  is,  we  can  immediately  reduce  the  size  of 
the  bump  by  one: 

2  15  4  3 


1 

-1 

-6  3" 

00 

-1 

to 

4 

-14  7 

5 

1  1 

2 

i 

This  idea  can  be  extended  to  any  column  that  has  a  single  nonzero  in  the  bump. 
For  example,  column  4  is  a  singleton  column  too.  The  trick  now  is  to  move  this 
column  to  the  leftmost  column  of  the  bump,  pushing  the  intermediate  columns  to 
the  right,  and  to  apply  the  same  permutation  to  the  rows.  After  permuting  the  rows 
and  columns  like  this,  the  bump  can  be  reduced  in  size  again: 


2  1  4 

5  3 

1 

-1 

CO 

_ i 

CO 

-1 

to 
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1 

1 

4 

-14  7 

2 
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Furthermore,  this  reduction  in  the  bump  causes  a  new  singleton  column  to  appear 
(since  a  singleton  need  only  be  a  singleton  within  the  bump),  namely,  column  3. 
Hence,  we  permute  once  again.  This  time  just  the  column  gets  permuted,  since  the 
singleton  is  already  in  the  correct  row.  The  bump  gets  reduced  in  size  once  again, 
now  to  a  1  x  1  bump,  which  is  not  really  a  bump  at  all: 


1 

3 
5 

4 
2 


2  14  3  5 

-1  3  -6 

-1  2 
1  1 

7  -14 
4 


Note  that  we  have  restored  upper  triangularity  using  only  permutations;  no  row 
operations  were  needed.  While  this  doesn’t  always  happen,  it  is  a  common  and 
certainly  welcome  event. 

Our  example,  being  fairly  small,  doesn’t  exhibit  all  the  possible  bump-reducing 
permutations.  In  addition  to  looking  for  singleton  columns,  one  can  also  look  for 
singleton  rows.  Each  singleton  row  can  be  moved  to  the  bottom  of  the  bump.  At  the 
same  time,  the  associated  column  is  moved  to  the  right-hand  column  of  the  bump. 
After  this  permutation,  the  right-hand  column  and  the  bottom  row  can  be  removed 
from  the  bump. 

Before  closing  this  section,  we  reiterate  a  few  important  points.  First,  as  the 
bump  gets  smaller,  the  chances  of  finding  further  singletons  increases.  Also,  with 
the  exception  of  the  lower-right  diagonal  element  of  the  bump,  all  other  diagonal 
elements  are  guaranteed  to  be  nonzero,  since  the  matrix  U  from  which  U  is  derived 
has  this  property.  Therefore,  most  bump  reductions  apply  the  same  permutation  to 
the  rows  as  to  the  columns.  Finally,  we  have  illustrated  how  to  update  the  factoriza¬ 
tion  once,  but  this  technique  can,  of  course,  be  applied  over  and  over.  Eventually, 
however,  it  becomes  more  efficient  to  refactorize  the  basis  from  scratch. 


7.  Partial  Pricing 

In  many  real-world  problems,  the  number  of  constraints  m  is  small  compared 
with  the  number  of  variables  n.  Looking  over  the  steps  of  the  primal  simplex 
method,  we  see  that  the  only  steps  involving  n- vectors  are  Step  2,  in  which  we 
pick  a  nonbasic  variable  to  be  the  entering  variable, 

pick  j  G  {j  G  AT  :  x*  <  0}; 

Step  6,  in  which  we  compute  the  step  direction  for  the  dual  variables, 

A  zH  =  -(B-1A)Tei; 

and  Step  8,  in  which  we  update  the  dual  variables, 

zXf  ZA f  ~  S^ZN- 

Scanning  all  the  nonbasic  indices  in  Step  2  requires  looking  at  n  candidates.  When 
n  is  huge,  this  step  is  likely  to  be  a  bottleneck  step  for  the  algorithm.  However, 
there  is  no  requirement  that  all  indices  be  scanned.  We  could  simply  scan  from 
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the  beginning  and  stop  at  the  first  index  j  for  which  z*  is  negative  (as  in  Bland’s 
rule).  However,  in  practice,  it  is  felt  that  picking  an  index  j  corresponding  to  a  very 
negative  z*  produces  an  algorithm  that  is  likely  to  reach  optimality  faster.  Therefore, 
the  following  scheme,  referred  to  as  partial  pricing  is  often  employed.  Initially,  scan 
only  a  fraction  of  the  indices  (say  n/3),  and  set  aside  a  handful  of  good  ones  (say, 
the  40  or  so  having  the  most  negative  z*).  Then  use  only  these  40  in  Steps  2,  6, 
and  8  for  subsequent  iterations  until  less  than  a  certain  fraction  (say,  1/2)  of  them 
remain  eligible.  At  this  point,  use  (6.8)  to  compute  the  current  values  of  a  new  batch 
of  n/3  nonbasic  dual  variables,  and  go  back  to  the  beginning  of  this  partial  pricing 
process  by  setting  aside  the  best  40.  In  this  way,  most  of  the  iterations  look  like  they 
only  have  40  nonbasic  variables.  Only  occasionally  does  the  grim  reality  of  the  full 
huge  number  of  nonbasic  variables  surface. 

Looking  at  the  dual  simplex  method  (Figure  6.1),  we  see  that  we  aren’t  so  lucky. 
In  it,  vectors  of  length  n  arise  in  the  max-ratio  test: 


t 


A  Zj 
max  — 
jeJV  z * 


pick  j  e  argmaxjGAr 


Azj_ 

Zj 


Here,  the  entire  collection  of  nonbasic  indices  must  be  checked;  otherwise,  dual 
feasibility  will  be  lost  and  the  algorithm  will  fail.  Therefore,  in  cases  where  n  is 
huge  relative  to  m  and  partial  pricing  is  used,  it  is  important  not  to  use  the  dual 
simplex  method  as  a  Phase  I  procedure.  Instead,  one  should  use  the  technique  of 
adding  artificial  variables  as  we  did  in  Chapter  2  to  force  an  initial  feasible  solution. 


8.  Steepest  Edge 

In  Chapter  4,  we  saw  that  one  of  the  drawbacks  of  the  largest-coefficient  rule 
is  its  sensitivity  to  the  scale  in  which  variables  are  quantified.  In  this  section,  we 
shall  discuss  a  pivot  rule  that  partially  remedies  this  problem.  Recall  that  each  step 
of  the  simplex  method  is  a  step  along  an  edge  of  the  feasible  region  from  one  vertex 
to  an  adjacent  vertex.  The  largest  coefficient  rule  picks  the  variable  that  gives  the 
largest  rate  of  increase  of  the  objective  function.  However,  this  rate  of  increase  is 
measured  in  the  “space  of  nonbasic  variables”  (we  view  the  basic  variables  simply 
as  dependent  variables).  Also,  this  space  changes  from  one  iteration  to  the  next. 
Hence,  in  a  certain  respect,  it  would  seem  wiser  to  measure  the  rate  of  increase  in 
the  larger  space  consisting  of  all  the  variables,  both  basic  and  nonbasic.  When  the 
rate  of  increase  is  gauged  in  this  larger  space,  the  pivot  rule  is  called  the  steepest- 
edge  rule.  It  is  the  subject  of  this  section. 

Fix  a  nonbasic  index  j  E  M.  We  wish  to  consider  whether  Xj  should  be  the 
entering  variable.  If  it  were,  the  step  direction  vector  would  be 


Axb 

"  —B~1Nej  " 

Axm 

ej 

Ax  = 


130 


8.  IMPLEMENTATION  ISSUES 


This  vector  points  out  along  the  edge  corresponding  to  the  pivot  that  would  result 
by  letting  Xj  enter  the  basis.  As  we  know,  the  objective  function  is 

f(x)  =  CTX  =  C^XB  +  CtfXtf. 


The  derivative  of  f(x)  in  the  direction  of  Ax  is  given  by 


df  _  ^  Ax  _  c^Axb  +  c^Axjv 


dAx  ||A:r||  ||A:r|| 

The  numerator  is  easy  (and  familiar): 

rri  rri  rri  -t 

cbAxb  +  cj^Ax^f  =  Cj  —  cbB~  N ej 

=  {cN  -  (B~1N)tcb) 


—  —z. 


The  denominator  is  more  troublesome: 

||Ax||2  =  ||Axg||2  +  1  = 


B~1Nej 


H-  1- 


To  calculate  B~1Nej  for  every  j  G  AT  is  exactly  the  same  as  computing  the  matrix 
B~x N,  which  (as  we’ve  discussed  before)  is  time  consuming  and  therefore  a  com¬ 
putation  we  wish  to  avoid.  But  it  turns  out  that  we  can  compute  B~1Nej  for  every 
j  G  Af  once  at  the  start  (when  B  is  essentially,  if  not  identically,  an  identity  matrix) 
and  then  update  the  norms  of  these  vectors  using  a  simple  formula,  which  we  shall 
now  derive. 

Let 


Vk 


B-'Nekf, 


k  G  Af. 


Suppose  that  we  know  these  numbers,  we  use  them  to  perform  one  step  of  the  sim¬ 
plex  method,  and  we  are  now  at  the  beginning  of  the  next  iteration.  As  usual,  let 
us  denote  quantities  in  this  next  iteration  by  putting  tildes  on  them.  For  example, 
B  denotes  the  new  basis  matrix.  As  we’ve  seen  before,  B  is  related  to  B  by  the 
equation  B  =  BE ,  where 


E _1  =  I 


Now,  let’s  compute  the  new  v  values: 
vk  =  B~T  B~x  ak 

=  akB~TE~TE~1B~1ak 


(A xb  ~  ej)ej 
Axi 


T 


(8.8) 


T  td-T 


=  al  B 


I 


ej(AxB  -  ej) 
Axi 


T 


Recall  from  (8.2)  that  we  must  compute 

V  =  B~t> 


I 


(A XB  -  ej)e\ 
Axi 


T 


B  1ak 


in  the  course  of  the  old  iteration.  If,  in  addition,  we  compute 

rri 

w  =  B  A  xb 
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then,  expanding  out  the  product  in  (8.8)  and  expressing  the  resulting  terms  using  v 
and  w,  we  get  the  following  formula  for  quickly  updating  the  old  z/s  to  the  new  z/s: 


a 


vk  =  Vk 


T 

k 


v(w  ~  v)T CLk 


+  (akv) 


A  Xi 


II A  xB 


e* 


(A  xi)- 


Recent  computational  studies  using  this  update  formula  have  shown  that  the  steepest- 
edge  rule  for  choosing  the  entering  variable  is  competitive  against,  if  not  superior 
to,  other  pivot  rules. 


Exercises 


8.1  (a)  Without  permuting  rows  or  columns,  compute  the  L  [/-factorization 
of 


(8.9) 


2 

1 


5 

1  3 
2 


-1 

(b)  Solve  the  system  B  Ax  b  =  a3  where 


6 

9  6 
6  4 
4  1 
-3  -1 


0 

2 

1 

3 

0 


(c)  Suppose  that  B  is  B  with  its  second  column  replaced  by  ctj .  Solve 
the  system  B  Ax  &  =  dj  where 


1 

0 

-1 

0 

0 


using  the  eta-matrix  method. 

(d)  Solve  the  system  BAx^  =  dj  again,  this  time  using  the  factorization 
updating  method. 


8.2  Use  the  minimum-degree  ordering  heuristic  to  find  an  LU- factorization 
of  the  matrix  B  given  by  (8.9). 

8.3  A  permutation  matrix  is  a  matrix  of  zeros  and  ones  for  which  each  row 
has  one  1  and  each  column  has  one  1. 

(a)  Let  B  be  an  m  x  m  matrix,  and  let  P  be  a  permutation  matrix.  Show 
that  PB  is  a  matrix  obtained  by  permuting  the  rows  of  B  and  that 
BP  is  a  matrix  obtained  by  permuting  the  columns  of  B.  Are  the 
two  permutations  the  same? 
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(b)  Show  that  every  permutation  of  the  rows  of  a  matrix  B  corresponds 
to  multiplying  B  on  the  left  by  a  permutation  matrix. 

(c)  Show  that  for  any  permutation  matrix  P , 

p-l  =  pT 

8.4  Explain  how  to  use  the  factorization  B  =  LU  to  solve 

BT  x  =  b. 


Notes 

Techniques  for  exploiting  sparsity  in  matrix  factorization  have  their  roots  in  the 
paper  by  Markowitz  (1957).  A  few  standard  references  on  matrix  factorization  are 
the  books  of  Duff  et  al.  (1986),  Golub  and  VanLoan  (1989),  and  Gill  et  al.  (1991). 
The  eta-matrix  technique  given  in  Section  8.3  for  using  an  old  basis  to  solve  systems 
of  equations  involving  the  current  basis  was  first  described  by  Dantzig  and  Orchard- 
Hayes  (1954).  The  factorization  updating  technique  described  in  Section  8.5  is  the 
method  given  by  Forrest  and  Tomlin  (1972).  The  bump  reduction  techniques  of 
Section  8.6  were  first  introduced  by  Saunders  (1973)  and  Reid  (1982).  The  steepest- 
edge  pivoting  rule  is  due  to  Goldfarb  and  Reid  (1977).  A  similar  rule,  known  as 
Dev  ex,  was  given  by  Harris  (1973). 


CHAPTER  9 


Problems  in  General  Form 


Up  until  now,  we  have  always  considered  our  problems  to  be  given  in  standard 
form.  However,  for  real-world  problems  it  is  often  convenient  to  formulate  problems 
in  the  following  form: 

maximize  cTx 

(9.1)  subject  to  a  <  Ax  <  b 

l  <  x  <  u  . 

Two-sided  constraints  such  as  those  given  here  are  called  constraints  with  ranges. 
The  vector  l  is  called  the  vector  of  lower  bounds ,  and  u  is  the  vector  of  upper 
bounds.  We  allow  some  of  the  data  to  take  infinite  values;  that  is,  for  each  i  = 
1,  2, . . . ,  m, 

—  OO  <  CLi  <  bi  <  00, 

and,  for  each  j  =  1,  2, . . . ,  n, 

—  OO  <  Ij  <  Uj  <  OO. 

In  this  chapter,  we  shall  show  how  to  modify  the  simplex  method  to  handle  problems 
presented  in  this  form. 


1.  The  Primal  Simplex  Method 

It  is  easiest  to  illustrate  the  ideas  with  an  example: 


maximize 

CO 

— 

X2 

subject  to 

1 

< 

—X\ 

+ 

X2 

< 

5 

2 

< 

— 3xi 

+ 

2x2 

< 

10 

2xi 

— 

x2 

< 

0 

-2 

< 

X\ 

0 

< 

x2 

< 

6 

With  this  formulation,  zero  no  longer  plays  the  special  role  it  once  did.  Instead,  that 
role  is  replaced  by  the  notion  of  a  variable  or  a  constraint  being  at  its  upper  or  lower 
bound.  Therefore,  instead  of  defining  slack  variables  for  each  constraint,  we  use  Wi 
simply  to  denote  the  value  of  the  ith  constraint: 

IV  i  =  —X\  +  .X‘2 

W2  =  —  3xi  +  2x2 

Ws  =  2.X‘1  —  X‘2  . 
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The  constraints  can  then  be  interpreted  as  upper  and  lower  bounds  on  these  vari¬ 
ables.  Now  when  we  record  our  problem  in  a  dictionary,  we  will  have  to  keep 
explicit  track  of  the  upper  and  lower  bound  on  the  original  Xj  variables  and  the  new 
Wi  variables.  Also,  the  value  of  a  nonbasic  variable  is  no  longer  implicit;  it  could 
be  at  either  its  upper  or  its  lower  bound.  Hence,  we  shall  indicate  which  is  the  case 
by  putting  a  box  around  the  relevant  bound.  Finally,  we  need  to  keep  track  of  the 
values  of  the  basic  variables.  Hence,  we  shall  write  our  dictionary  as  follows: 


l 

-2 

0 

u 

oo 

6 

c  = 

3xi  — 

X2  = 

—6 

1 

5 

IV 1  = 

— X\  -j- 

X2  = 

2 

2  10 

W2  = 

— 3xi  +  2^2  = 

6 

—  oo 

0 

W3  = 

2x\  — 

X2  = 

-4  . 

Since  all  the  W{  s  are  between  their  upper  and  lower  bounds,  this  dictionary  is  fea¬ 
sible.  But  it  is  not  optimal,  since  x\  could  be  increased  from  its  present  value  at 
the  lower  bound,  thereby  increasing  the  objective  function’s  value.  Hence,  x\  shall 
be  the  entering  variable  for  the  first  iteration.  Looking  at  wi,  we  see  that  x\  can  be 
raised  only  1  unit  before  w\  hits  its  lower  bound.  Similarly,  x\  can  be  raised  by  4/3 
units,  at  which  point  W2  hits  its  lower  bound.  Finally,  if  x\  were  raised  2  units,  then 
W3  would  hit  its  upper  bound.  The  tightest  of  these  constraints  is  the  one  on  and 
so  wi  becomes  the  leaving  variable — which,  in  the  next  iteration,  will  then  be  at  its 
lower  bound.  Performing  the  usual  row  operations,  we  get 


i 

1 

0 

u 

5 

6 

c  = 

—3'Wi  -j-  2x2  = 

-3 

—2  oo 

X\  = 

—wi  + 

x2 

-1 

2  10 

W2  = 

3'Wi  — 

X2  = 

3 

—  oo  0 

Ws  = 

—2wi  + 

x2 

-2  . 

Note,  of  course,  that  the  objective  function  value  has  increased  (from  —6  to  —3). 
For  the  second  iteration,  raising  X2  from  its  lower  bound  will  produce  an  increase 
in  (.  Hence,  X2  is  the  entering  variable.  Looking  at  the  basic  variables  (xi,  W2,  and 
ws),  we  see  that  W2  will  be  the  first  variable  to  hit  a  bound,  namely,  its  lower  bound. 


Hence,  W2  is  the  leaving  variable 

l 

u 

,  which  will  become  nonbasic  at  its  lower  bound: 

1  2 

5  10 

C  =  3w\  —  2w2  =  —1 

—2  oo 

0  6 

— oo  0 

X\  =  2w\  —  IV  2  =  0 

X2  =  3wi  —  W2  =  1 

Ws  =  Wi  —  W2  =  —  1  • 

For  the  third  iteration,  w i  is  the  entering  variable,  and  ws  is  the  leaving  vari¬ 
able,  since  it  hits  its  upper  bound  before  any  other  basic  variables  hit  a  bound. 
The  result  is 
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Now  for  the  next  iteration,  note  that  the  coefficients  on  both  W3  and  w 2  are  positive. 
But  W3  is  at  its  upper  bound,  and  so  if  it  were  to  change,  it  would  have  to  decrease. 
However,  this  would  mean  a  decrease  in  the  objective  function.  Hence,  only  w 2 
can  enter  the  basis,  in  which  case  X2  is  the  leaving  variable  getting  set  to  its  upper 
bound: 


1 

u 

00 

0 

0 

6 

(  =  l.hws  +  0.5x2  =  3 

—2  00 

2  10 

1  5 

x\  —  0.5iE3  +  0.5x2  =  3 

W2  =  —  l.hws  +  0.5x2  =  3 
w\  =  —0.5ws  +  0.5x2  =  3  . 

For  this  dictionary,  both  W3  and  x 2  are  at  their  upper  bounds  and  have  positive 
coefficients  in  the  formula  for  (.  Hence,  neither  can  be  moved  off  from  its  bound  to 
increase  the  objective  function.  Therefore,  the  current  solution  is  optimal. 


2.  The  Dual  Simplex  Method 

The  problem  considered  in  the  previous  section  had  an  initial  dictionary  that 
was  feasible.  But  as  always,  we  must  address  the  case  where  the  initial  dictionary 
is  not  feasible.  That  is,  we  must  define  a  Phase  I  algorithm.  Following  the  ideas 
presented  in  Chapter  5,  we  base  our  Phase  I  algorithm  on  a  dual  simplex  method. 
To  this  end,  we  need  to  introduce  the  dual  of  (9.1).  So  first  we  rewrite  (9.1)  as 

maximize  cTx 
subject  to  Ax  <  b 

—Ax  <  —a 
X  <  u 
—x  <  —l  , 

and  adding  slack  variables,  we  have 

maximize  cTx 
subject  to  Ax  +  /  =  b 
—Ax  +  p  =  —  a 
X  +  t  =  u 
—x  +  g  =  —l 
f,  P,  t,  g  >  0  . 
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We  see  immediately  from  the  inequality  form  of  the  primal  that  the  dual  can  be 
written  as 


minimize  bT  v  —  aTq  +  uTs  —  lTh 

(9.2)  subject  to  AT(v  —  q)  —  (h  —  s)  =  c 

v,  q,  s,  h  >  0  . 


Furthermore,  at  optimality,  the  dual  variables  are  complementary  to  the  correspond¬ 
ing  primal  slack  variables: 


(9.3) 


fiVi  =  0  i  =  1,2, . . .  ,ra, 

V%qi  =  0  i  —  1?  2, . . . ,  m, 


tjSj  =  0  j  =  1,  2, . . . ,  n, 

gjhj  =  0  j  =  1,  2, . . . ,  n. 


Note  that  for  each  i,  if  6^  >  then  at  optimality  vi  and  qi  must  be  comple¬ 
mentary  to  each  other.  Indeed,  if  both  were  positive,  then  they  could  be  reduced 
by  an  equal  amount  without  destroying  feasibility,  and  the  objective  function  value 
would  strictly  decrease,  thereby  implying  that  the  supposedly  optimal  solution  is  not 
optimal.  Similarly,  if  for  some  i,  bi  =  then  it  is  no  longer  required  that  Vi  and 
qi  be  complementary  at  optimality;  but,  given  an  optimal  solution  for  which  both 
Vi  and  qi  are  positive,  we  can  decrease  both  these  values  at  the  same  rate  until  the 
smaller  of  the  two  reaches  zero,  all  the  while  preserving  feasibility  of  the  solution 
and  not  changing  the  objective  function  value.  Hence,  there  always  exists  an  opti¬ 
mal  solution  in  which  every  component  of  v  is  complementary  to  the  corresponding 
component  of  q.  The  same  argument  shows  that  if  there  exists  an  optimal  solution, 
then  there  exists  one  in  which  all  the  components  of  h  and  s  are  complementary  to 
each  other  as  well. 

For  a  real  variable  £,  its  positive  part  is  defined  as 

£+  =  max{£,  0} 


and  its  negative  part  £  is  defined  similarly  as 

C  =  max{— 0}. 

Clearly,  both  £+  and  are  nonnegative.  Furthermore,  they  are  complementary, 

£+  =  0  or  r  =  0, 


and  their  difference  represents 


From  the  complementarity  of  the  components  of  v  against  the  components  of 
q,  we  can  think  of  them  as  the  positive  and  negative  parts  of  the  components  of  just 
one  vector  y.  So  let  us  write: 

v  =  y+  and  q  =  y~ . 


and  s  =  z 


Similarly,  let  us  write 
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If  we  impose  these  complementarity  conditions  not  just  at  optimality  but  also  from 
the  start,  then  we  can  eliminate  v,  q,  s,  and  h  from  the  dual  and  write  it  simply  as 

minimize  bT y+  —  aTy~  +  uT z+  —  lT  z~ 
subject  to  ATy  —  z  =  c  , 


where  the  notation  y+  denotes  the  componentwise  positive  part  of  y,  etc.  This  prob¬ 
lem  is  an  example  from  the  class  of  problems  called  piecewise  linear  programs. 
Usually,  piecewise  linear  programs  are  solved  by  converting  them  into  linear  pro¬ 
grams.  Here,  however,  we  wish  to  go  in  the  other  direction.  We  shall  present  an 
algorithm  for  (9.4)  that  will  serve  as  an  algorithm  for  (9.2).  We  will  call  this  algo¬ 
rithm  the  dual  simplex  method  for  problems  in  general  form. 

To  economize  on  the  presentation,  we  shall  present  the  dual  simplex  method 
in  the  context  of  a  Phase  I  algorithm  for  linear  programs  in  general  form.  Also, 
to  avoid  cumbersome  notations,  we  shall  present  the  algorithm  with  the  following 
example: 

maximize  2x\  —  x 2 

subject  to  0  <  x\  +  X2  <  6 

2  <  -x\  +  2x2  <  10 

x\  —  X2  <  0 

—2  <  x1 

1  <  X2  <  5  . 

The  piecewise  linear  formulation  of  the  dual  is 

minimize  6yf  +  10 +  2 z+  —  z£ 

—  2y^  +  007/3"  +  oo^r  +  5  Z2 
subject  to  y\  y2  +  2/3  -  ^1  =2 

Vi  +  2y2  -  2/3  -  -  z2  =  -1  . 


Note  that  the  objective  function  has  coefficients  that  are  infinite.  The  correct 
convention  is  that  infinity  times  a  variable  is  plus  infinity  if  the  variable  is  positive, 
zero  if  the  variable  is  zero,  and  minus  infinity  if  the  variable  is  negative. 

Since  the  objective  function  is  nonlinear  (taking  positive  and  negative  parts  of 
variables  is  certainly  a  nonlinear  operation),  we  will  not  be  able  to  do  the  usual  row 
operations  on  the  objective  function.  Therefore,  in  each  iteration,  we  simply  study 
it  as  is.  But  as  usual,  we  prefer  to  think  in  terms  of  maximization,  and  so  we  record 
the  negative  of  the  objective  function: 

(9  6)  -£  =  -Qvt  ~  io%+  -  2 z++  z+ 

+  2 -  001/3  ~  °°zi  ~  5z2  ■ 

We  can  of  course  perform  row  operations  on  the  two  constraints,  so  we  set  up 
the  usual  sort  of  dictionary  for  them: 


(9  T)  z1  =  -2  +  y1-  y2+  y3 

Z2=  1  +  yi  +  2y2  -  y3  ■ 

For  the  dual  problem,  all  the  action  takes  place  at  zero.  That  is,  slopes  in  the  objec¬ 
tive  function  change  when  a  variable  goes  from  negative  to  positive.  Since  nonbasic 
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variable  are  supposed  to  be  set  where  the  action  is,  we  associate  a  current  solution 
with  each  dictionary  by  setting  the  nonbasic  variables  to  zero.  Hence,  the  solution 
associated  with  the  initial  dictionary  is 

(2/1? V2-) y?>-) •>  ^2)  (0? 0? 0?  2, 1). 


The  fact  that  z\  is  negative  implies  that  zf  is  a  positive  number  and  hence  that  the 
objective  function  value  associated  with  this  solution  is  minus  infinity.  Whenever 
the  objective  function  value  is  minus  infinity,  we  say  that  the  solution  is  infeasible . 
We  also  refer  to  the  associated  dictionary  as  infeasible.  Hence,  the  initial  dictionary 
given  in  (9.7)  is  infeasible. 

The  dual  simplex  method  must  start  with  a  dual  feasible  solution.  But  since  we 
intend  to  use  the  dual  simplex  method  simply  to  find  a  feasible  solution  for  (9.5),  we 
are  free  to  change  the  objective  function  in  (9.5)  any  way  we  please.  In  particular, 
we  can  change  it  from 

C  =  2xi  -  x2 


to 


T]  =  —  2xi  —  x2  • 

Making  that  change  to  the  primal  leaves  the  dual  objective  function  unchanged,  but 
produces  a  feasible  dual  dictionary: 


(9.8) 


zi  =  2  +  y1  -  2/2  +  2/3 
£2  =  1  +  yi  +  2t/2  -  2/3  • 


For  comparison  purposes,  let  us  also  record  the  corresponding  primal  dictio¬ 
nary.  It  is  easy  to  write  down  the  equations  defining  the  wf  s,  but  how  do  we  know 
whether  the  xf  s  are  supposed  to  be  at  their  upper  or  their  lower  bounds?  The  an¬ 
swer  comes  from  the  requirement  that  the  primal  and  dual  satisfy  the  complemen¬ 
tarity  conditions  given  in  (9.3).  Indeed,  from  the  dual  dictionary  we  see  that  z\  =  1. 
Hence,  z+  =  1.  But  since  is  just  a  surrogate  for  hi,  we  see  that  hi  is  positive 
and  hence  that  gi  must  be  zero.  This  means  that  xi  must  be  at  its  lower  bound.  Sim¬ 
ilarly,  for  the  sake  of  complementarity,  x2  must  also  be  at  its  lower  bound.  Hence, 
the  primal  dictionary  is 


l 

u 

-2 

1 

00 

5 

T]  = 

—  X\ 

— 

x2 

=  1 

0 

6 

Wi  = 

Xi 

+ 

x2 

=  -1 

2 

10 

w2  = 

—  Xi 

H-  2^2 

=  4 

—00 

0 

Ws  = 

Xi 

— 

x2 

=  -3  . 

Note  that  it  is  infeasible,  since  w  1  is  not  between  its  upper  and  lower  bounds. 

We  are  now  ready  to  describe  the  first  iteration  of  the  dual  simplex  method. 
To  this  end,  we  ask  whether  we  can  improve  the  dual  objective  function  value  by 
moving  one  of  the  nonbasic  variables  (yi,  y2,  or  yf)  away  from  zero.  Of  course, 
each  of  these  three  variables  can  be  moved  either  to  the  positive  or  the  negative  side 
of  zero;  we  must  analyze  these  six  cases  individually.  First  of  all,  note  that  since 
zi  is  positive  at  the  current  solution,  it  follows  that  =  zi  and  zf  =  0  in  a 
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neighborhood  of  the  current  solution.  A  similar  statement  can  be  made  for  Z2,  and 
so  we  can  rewrite  (9.6)  locally  around  the  current  solution  as 

-£  =  -6 vt  ~  10 yt  -  2zi  +  22 

+  2  -  00  2/3  . 

Now,  as  yi  is  increased  from  zero,  the  rate  of  increase  of  —  £  is  simply  the  derivative 
of  the  right-hand  side  with  respect  to  2/1 ,  where  we  must  keep  in  mind  that  z\  and  z: 2 
are  functions  of  yi  via  the  dictionary  (9.8).  Hence,  the  rate  of  increase  is  —6  —  2  + 
1  =  —  7;  i.e.,  the  objective  function  decreases  at  a  rate  of  7  units  per  unit  increase 
of  yi .  If,  on  the  other  hand,  2/2  is  decreased  from  zero  into  negative  territory,  then 
the  rate  of  increase  of  —  £  is  the  negative  of  the  derivative  of  the  right-hand  side.  In 
this  case  we  get  no  contribution  from  but  we  do  get  something  from  z\  and  z: 2 
for  a  total  of  2  —  1  =  1.  Hence,  the  rate  of  increase  as  we  move  in  this  direction  is 
one  unit  increase  per  unit  move.  We  can  analyze  changes  to  2/2  and  2/3.  The  entire 
situation  can  be  summarized  as  follows: 

2/1/  -6-2  +  1=  -7 
2/1  \  0  +  2  -  1  =  1 

y2  Z  —10  +  2  +  2  =  -6 

y2\  2-2-2=  -2 

2/3  Z  0-2-1=  -3 
2/3  \  -00  +  2  +  1  =  -00  . 

Of  these  six  cases,  the  only  one  that  brings  about  an  increase  in  —  £  is  the  one  in 
which  yi  is  sent  negative.  Hence,  2/1  shall  be  our  entering  variable,  and  it  will  go 
negative.  To  find  the  leaving  variable,  we  must  ask:  as  yi  goes  negative,  which  of 
z\  and  Z2  will  hit  zero  first?  For  the  current  dictionary,  z 2  gets  to  zero  first  and 
so  becomes  the  leaving  variable.  Performing  the  usual  row  operations,  the  new 
dictionary  for  the  dual  problem  is 

z\=  1  +  ^2  -  3^/2  +  2?/3 

2/i  =  -1  +  32  —  22/2  -  2/3  • 

Let  us  have  a  look  at  the  new  primal  dictionary.  The  fact  that  2/1  was  the  entering 
variable  in  the  dual  dictionary  implies  that  w\  is  the  leaving  variable  in  the  primal. 
Furthermore,  the  fact  that  2/1  has  gone  negative  implies  that  y±  is  now  positive,  and 
so  complementarity  then  demands  that  q\  be  zero;  i.e.,  w\  should  go  to  its  lower 
bound.  The  fact  that  z 2  was  the  leaving  variable  in  the  dual  dictionary  implies  that 
X2  is  the  entering  variable  in  the  primal.  Hence,  the  new  primal  dictionary  is 


l 

u 

-2 

00 

0 

6 

77  =  —x\  —  wi  =  2 

1  5 

2  10 

—00  0 

X2  =  —X\  +  W\  =  2 

W2  =  —  3x‘i  +  2wi  =  6 

W3  =  2xi  —  wi  =  —4  . 

We  are  now  ready  to  begin  the  second  iteration.  Therefore,  we  ask  which  non- 
basic  variable  should  be  moved  away  from  zero  (and  in  which  direction).  As  before, 
we  first  note  that  z\  positive  implies  that  =  z\  and  zj“  =  0  and  that  2/1  negative 
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implies  that  y+  =  0  and  y1  =  —y\.  Hence,  the  objective  function  can  be  written 
locally  around  the  current  solution  as 

-e  =  -  10  yt_  _-2Zl+  4 

+  2  yi  -  00  yj  -  5  o2“  . 

We  now  summarize  the  possibilities  in  a  small  table: 

£2  Z  1  -  2  =  -1 

z2\  -5  +  2  =  -3 
2/2  Z  +  6  =  -4 
2/2  \  2  -  6  =  -4 
2/3  Z  0  -  4  =  -4 
2/3  \  —oo  4-  4  =  -oo  . 

Note  that  all  the  changes  are  negative,  meaning  that  there  are  no  possibilities  to 
increase  the  objective  function  any  further.  That  is,  the  current  dual  solution  is 
optimal.  Of  course,  this  also  could  have  been  deduced  by  observing  that  the  primal 
dictionary  is  feasible  (which  is  what  we  are  looking  for,  after  all). 

Even  though  this  example  of  the  dual  simplex  method  has  terminated  after  only 
one  iteration,  it  should  be  clear  how  to  proceed  had  it  not  terminated. 

Now  that  we  have  a  feasible  solution  for  the  primal,  we  could  solve  the  problem 
to  optimality  by  simply  reinstating  the  original  objective  function  and  proceeding 
by  applying  the  primal  simplex  method  in  a  Phase  II  procedure  to  find  the  optimal 
solution.  Since  the  primal  simplex  method  has  already  been  discussed,  we  stop  here 
on  this  problem. 

Exercises 

Solve  the  following  linear  programming  problems: 

9.1  maximize  —  x\  +  x2 
subject  to  —x\  +  X2  <  5 

x\  —  2x2  <  9 
0  <  x\  <6 
0  <  x2  <  8  . 

9.2  maximize  — 3xi  —  X2  +  X3  +  2x4  —  X5  +  xq  —  x^  —  4x§ 

subject  to  x\  +  4x3  +  X4  —  5x5  —  2x6  +  3x7  —  6xg  =  7 

X2  —  3x3  —  X4  +  4x5  +  %6  —  2x7  +  5xg  =  —3 

0  <  xi  <  8 

0  <  x2  <  6 

0  <  x3  <  10 

0  <  x4  <  15 

0  <  x5  <  2 

0  <  x6  <  10 

0  <  X7  <  4 
0  <  xg  <  3. 


Notes 

Dantzig  (1955)  was  the  first  to  consider  variants  of  the  simplex  method  that 
handle  bounds  and  ranges  implicitly. 


CHAPTER  10 


Convex  Analysis 


This  book  is  mostly  about  linear  programming.  However,  this  subject,  impor¬ 
tant  as  it  is,  is  just  a  subset  of  a  larger  subject  called  convex  analysis.  In  this  chapter, 
we  shall  give  a  brief  introduction  to  this  broader  subject.  In  particular,  we  shall  prove 
a  few  of  the  fundamental  results  of  convex  analysis  and  see  that  their  proofs  depend 
on  some  of  the  theory  of  linear  programming  that  we  have  already  developed. 

1.  Convex  Sets 

Given  a  finite  set  of  points,  zi,  •  •  • ,  zn,  in  Mm,  a  point  z  in  Mm  is  called  a 
convex  combination  of  these  points  if1 

n 

z  =  ^2tizh 

3  = 1 

where  t3  >  0  for  each  j  and  tj  =  1-  It  is  called  a  strict  convex  combination  if 

none  of  the  tfs  vanish.  For  n  =  2,  the  set  of  all  convex  combinations  of  two  points 
is  simply  the  line  segment  connecting  them. 

A  subset  S  of  Mm  is  called  convex  if,  for  every  x  and  y  in  S,  S  also  contains  all 
points  on  the  line  segment  connecting  x  and  y.  That  is,  tx  +  (1  —  t)y  G  S,  for  every 
0  <  t  <  1.  See  Figure  10.1. 

Certain  elementary  properties  of  convex  sets  are  trivial  to  prove.  For  example, 
the  intersection  of  an  arbitrary  collection  of  convex  sets  is  convex.  Indeed,  let  Sa, 
a  G  /,  denote  a  collection  of  convex  sets  indexed  by  some  set  I.  Then  the  claim  is 
that  naeiSa  is  convex.  To  see  this,  consider  an  arbitrary  pair  of  points  x  and  y  in 
the  intersection.  It  follows  that  x  and  y  are  in  each  Sa.  By  the  convexity  of  Sa  it 
follows  that  Sa  contains  the  line  segment  connecting  x  and  y.  Since  each  of  these 
sets  contains  the  line  segment,  so  does  their  intersection.  Hence,  the  intersection  is 
convex. 

Here  is  another  easy  one: 

THEOREM  10.1.  A  set  C  is  convex  if  and  only  if  it  contains  all  convex  combina¬ 
tions  of  points  in  C. 


1  Until  now  we’ve  used  subscripts  for  the  components  of  a  vector.  In  this  chapter,  subscripts  will  be 
used  to  list  sequences  of  vectors.  Hopefully,  this  will  cause  no  confusion. 
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Figure  10. 1 .  The  set  on  the  left  is  convex — for  any  pair  of  points 
in  the  set,  the  line  segment  connecting  the  two  points  is  also  con¬ 
tained  in  the  set.  The  set  on  the  right  is  not  convex — there  exists 
pairs  of  points,  such  as  the  x  and  y  shown,  for  which  the  connect¬ 
ing  line  segment  is  not  entirely  in  the  set. 


Proof.  Let  C  be  a  convex  set.  By  definition,  C  contains  all  convex  combina¬ 
tions  of  pairs  of  points  in  C.  The  first  nontrivial  step  is  to  show  that  C  contains  all 
convex  combinations  of  triples  of  points  in  C.  To  see  this,  fix  zi,  Z2,  and  z%  in  C 
and  consider 


z  =  hz  i  +  t2z2  +  t3Z3, 

where  t3  >  0  for  each  j  and  Y^=i  tj  =  1-  If  any  °f  the  s  vanish,  then  z  is  really 
just  a  convex  combination  of  two  points  and  so  belongs  to  C.  Hence,  suppose  that 
each  of  the  tj9 s  is  strictly  positive.  Rewrite  z  as  follows: 


Z  =  {  1  -  h) 

=  (i  -  h) 


ft i  ,  t2  \ 

{—/' + —/v 

( - — - Zi  H - — - Z2 

\ti  T  t2  h+t2  z 


+  t3Z3 
)  H~  t^Z  3. 


Since  C  contains  all  convex  combinations  of  pairs  of  points,  it  follows  that 

h  ,  t2 


ti  + 12 


+ 


ti  + 12 


z2  G  C. 


Now,  since  z  is  a  convex  combination  of  the  two  points  t*lt  z\  +  t*+t  ^2  and  z3, 
both  of  which  belong  to  C,  it  follows  that  z  is  in  C.  It  is  easy  to  see  (pun  intended) 
that  this  argument  can  be  extended  to  an  inductive  proof  that  C  contains  all  convex 
combinations  of  finite  collections  of  points  in  C.  Indeed,  one  must  simply  show  that 
the  fact  that  C  contains  all  convex  combinations  of  n  points  from  C  implies  that  it 
contains  all  convex  combinations  of  n  +  1  points  from  C.  We  leave  the  details  to 
the  reader. 

Of  course,  proving  that  a  set  is  convex  if  it  contains  every  convex  combination 
of  its  points  is  trivial:  simply  take  convex  combinations  of  pairs  to  get  that  it  is 
convex.  □ 


For  each  set  S  in  Mm  (not  necessarily  convex),  there  exists  a  smallest  convex 
set,  which  we  shall  denote  by  conv(S'),  containing  S.  It  is  defined,  quite  simply, 
as  the  intersection  of  all  convex  sets  containing  S.  From  our  discussion  about  in¬ 
tersections,  it  follows  that  this  set  is  convex.  The  set  conv(S')  is  called  the  convex 
hull  of  S.  This  definition  can  be  thought  of  as  a  definition  from  the  “outside,”  since 
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it  involves  forming  the  intersection  of  a  collection  of  sets  that  contain  S.  Our  next 
theorem  gives  a  characterization  of  convex  sets  from  the  “inside”: 

Theorem  10.2.  The  convex  hull  conv(S')  of  a  set  S  in  Mm  consists  precisely  of 
the  set  of  all  convex  combinations  of  finite  collections  of  points  from  S. 

Proof.  Let  H  denote  the  set  of  all  convex  combinations  of  finite  sets  of  points 
in  S: 

{n  n 

z  =  tjZj  :  n  >  1,  Zj  E  S  and  t3  >  0  for  all  j,  and  t3  =  1 

3= 1  3= 1 

It  suffices  to  show  that  (1)  H  contains  S,  (2)  H  is  convex,  and  (3)  every  convex  set 
containing  S  also  contains  H. 

To  see  that  H  contains  S,  just  take  n  =  1  in  the  definition  of  H. 

To  see  that  H  is  convex,  fix  two  points  x  and  y  in  H  and  a  real  number  0  <  t  < 
1.  We  must  show  that  z  =  tx  +  (1  —  t)y  E  H.  The  fact  that  x  E  H  implies  that 
x  =  Y^j=i  Pjxj’  f°r  some  r  >  1,  where  pj  >  0  for  j  =  1,  2, . . . ,  r,  Y^j=i  Pj  = 
and  Xj  E  S'  for  j  =  1,  2, . . . ,  r.  Similarly,  the  fact  that  y  is  in  H  implies  that 
y  =  Yf3=i  QjVj ,  for  some  s  >  1,  where  ^  >  0  for  j  =  1,  2, . . . ,  8,  Ylj=i  Tj  = 
and  y3  E  S  for  j  =  1,  2, . . . ,  s.  Hence, 

r  s 

z  =  tx+(i-t)y  =  y2  fpjxj  +  _  tfijyj- 

3= 1  i=1 

Since  the  coefficients  (tpi, . . . ,  tpri  (1  —  t)qi, . . . ,  (1  —  t)qs)  are  all  positive  and 
sum  to  one,  it  follows  that  this  last  expression  for  z  is  a  convex  combination  of  r  +  s 
points  from  S.  Hence,  z  is  in  H.  Since  x  and  y  were  arbitrary  points  in  H  and  t 
was  an  arbitrary  real  number  between  zero  and  one,  the  fact  that  z  E  H  implies  that 
H  is  convex. 

It  remains  simply  to  show  that  H  is  contained  in  every  convex  set  containing 
S.  Let  C  be  such  a  set  (i.e.,  convex  and  containing  S).  From  Theorem  10.1  and  the 
fact  that  C  contains  S,  it  follows  that  C  contains  all  convex  combinations  of  points 
in  S.  Hence,  C  contains  H.  □ 

2.  Caratheodory’s  Theorem 

In  the  previous  section,  we  showed  that  the  convex  hull  of  a  set  S  can  be  con¬ 
structed  by  forming  all  convex  combinations  of  finite  sets  of  points  from  S.  In  1907, 
Caratheodory  showed  that  it  is  not  necessary  to  use  all  finite  sets.  Instead,  m  +  1 
points  suffice: 

Theorem  10.3.  The  convex  hull  conv(S')  of  a  set  S  in  Mm  consists  of  all  convex 
combinations  of  m  +  1  points  from  S: 

m+ 1 

z  =  tj  Zj  :  Zj  E  S  and  tj  >  0  for  all  j ,  and  tj  =  1 

3  = 1  3 


conv(S') 
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Proof.  Let  H  denote  the  set  on  the  right.  From  Theorem  10.2,  we  see  that  H 
is  contained  in  conv(S').  Therefore,  it  suffices  to  show  that  every  point  in  conv(S') 
belongs  to  H.  To  this  end,  fix  a  point  z  in  conv(S').  By  Theorem  10.2,  there  ex¬ 
ists  a  collection  of,  say,  n  points  z\ ,  Z2 , . . . ,  zn  in  S  and  associated  nonnegative 
multipliers  £i,  £2,  •  •  • ,  tn  summing  to  one  such  that 


(10.1)  z  =  YJtJzJ. 

3= 1 


Let  A  denote  the  matrix  consisting  of  the  points  zi,  Z2, . . . ,  zn  as  the  columns  of  A: 


*2 


Also,  let  x *  denote  the  vector  consisting  of  the  multipliers  ti,  £2,  •  •  •  An- 

~h 

t2 


Finally,  let  b  =  z.  Then  from  (10.1),  we  see  that  x*  is  feasible  for  the  following 
linear  programming  problem: 


(10.2) 


maximize  cTx 
subject  to  Ax  =  b 
eTx  =  1 
x  >  0  . 


The  fundamental  theorem  of  linear  programming  (Theorem  3.4)  tells  us  that  ev¬ 
ery  feasible  linear  program  has  a  basic  feasible  solution.  For  such  a  solution,  only 
the  basic  variables  can  be  nonzero.  The  number  of  basic  variables  in  (10.2)  co¬ 
incides  with  the  number  of  equality  constraints;  that  is,  there  are  at  most  m  +  1 
variables  that  are  nonzero.  Hence,  this  basic  feasible  solution  corresponds  to  a  con¬ 
vex  combination  of  just  ra  +  1  of  the  original  n  points.  (See  Exercise  10.5.)  □ 


It  is  easy  to  see  that  the  number  m  - F  1  is  the  best  possible.  For  example,  the 
point  (l/(m  +  1),  l/(m  +  1), . . . ,  l/(m  +  1))  in  Mm  belongs  to  the  convex  hull  of 
the  m  +  1  points  ei,  e2,  •  •  • ,  em,  0  but  is  not  a  convex  combination  of  any  subset  of 
them. 


3.  The  Separation  Theorem 

We  shall  define  a  halfspace  of  Mn  to  be  any  set  given  by  a  single  (nontrivial) 
linear  inequality: 

n 

(10.3)  {x  e  Mn  :  E  Clj %j  E  by ,  (u-i ,  U-2  ?  •  •  •  ?  ^n)  7^ 

.7  =  1 

Every  halfspace  is  convex.  To  see  this,  suppose  that  x  =  (xi,X2, . . .  ,xn)  and 
y  =  (2/1, 2/2j  •  •  •  ?  Vn)  both  satisfy  the  linear  inequality  in  (10.3).  Fix  t  between 
zero  and  one.  Then  both  t  and  1  —  t  are  nonnegative,  and  so  multiplying  by  them 
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preserves  the  direction  of  inequality.  Therefore,  multiplying  ajxj  <  b  by  t  and 
UjUj  <  b  by  1  —  t  and  then  adding,  we  get 

S2  aj  (txj  +  (i  -  t)yj)  < b ■ 

3 

That  is,  tx  +  (1  —  t)y  also  satisfies  the  inequality  defining  the  halfspace. 

If  we  allow  the  vector  of  coefficients  (ai,  a2, . . . ,  an)  in  the  definition  of  a 
halfspace  to  vanish,  then  we  call  the  set  so  defined  a  generalized  halfspace.  It  is 
easy  to  see  that  every  generalized  halfspace  is  simply  a  halfspace,  all  of  Mn,  or  the 
empty  set.  Also,  every  generalized  halfspace  is  clearly  convex. 

A  polyhedron  is  defined  as  the  intersection  of  a  finite  collection  of  generalized 
halfspaces.  That  is,  a  polyhedron  is  any  set  of  the  form 

\  n  1 

l  x  €  R"  :  yj  ctijXj  <  bi,  i  =  1, 2, . . . ,  m  >  . 

I  *=1  J 

Every  polyhedron,  being  the  intersection  of  a  collection  of  convex  sets,  is  convex. 
The  following  theorem  is  called  the  Separation  Theorem  for  polyhedra. 

THEOREM  10.4.  Let  P  and  P  be  two  disjoint  nonempty  polyhedra  in  Mn.  Then 
there  exist  disjoint  halfspaces  H  and  H  such  that  P  C  H  and  P  C  H. 

Proof.  Suppose  that  P  and  P  are  given  by  the  following  systems  of 
inequalities: 


~A 

X  < 

~b 

A 

b 

P  =  {x  :  Ax  <  6}, 

P  =  {x  :  Ax  <  b}. 

The  disjointness  of  P  and  P  implies  that  there  is  no  solution  to  the  system 


(10.4) 


To  continue  the  proof,  we  need  a  result  known  as  Farkas’  Lemma,  which  says  that 
Ax  <  b  has  no  solutions  if  and  only  if  there  is  an  m-vector  y  such  that 

ATy  =  0 

y  >  o 

bTy  <  0. 

We  shall  prove  this  result  in  the  next  section.  For  now,  let  us  apply  Farkas’  Lemma 
to  the  situation  at  hand.  Indeed,  the  fact  that  there  are  no  solutions  to  (10.4)  implies 
that  there  exists  a  vector,  which  we  shall  write  in  block  form  as 


y 

y 
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such  that 

(10.5) 

(10.6) 

(10.7) 


y 

y 


=  ATy  +  ATy 


0 


fT  yr~ 

y 

y_ 

bTy  Ab1  y  < 


y 

y. 

T . 


> 


0 

0. 


From  the  last  condition,  we  see  that  either  bTy  <  0  or  bT y  <  0  (or  both).  Without 
loss  of  generality,  we  may  assume  that 


bT y  <  0. 


Farkas’  Lemma  (this  time  applied  in  the  other  direction)  together  with  the  nonempti¬ 
ness  of  P  now  implies  that 

ATy  ^  0. 

Put 

H  =  :  ( ATy)Tx  <  bT ?/ j  and  H  =  :  ( ATy)Tx  >  —  bT y^  . 

These  sets  are  clearly  halfspaces.  To  finish  the  proof,  we  must  show  that  they  are 
disjoint  and  contain  their  corresponding  polyhedra. 

First  of  all,  it  follows  from  (10.7)  that  H  and  H  are  disjoint.  Indeed,  suppose 
that  x  G  H.  Then  ( ATy)T x  <  bT y  <  — bT y ,  which  implies  that  x  is  not  in  H. 

To  show  that  P  C  77,  fix  x  in  P.  Then  Ax  <  b .  Since  y  >  0  (as  we  know  from 
(10.6)),  it  follows  then  that  yT Ax  <  yTb.  But  this  is  exactly  the  condition  that  says 
that  x  belongs  to  H.  Since  x  was  an  arbitrary  point  in  P,  it  follows  that  P  C  H. 

Showing  that  P  is  a  subset  of  H  is  similar.  Indeed,  suppose  that  x  G  P.  Then 
Ax  <  b.  Multiplying  on  the  left  by  —yT  and  noting  that  y  >  0,  we  see  that 
—yT Ax  >  —yTb.  But  from  (10.5)  we  see  that  —yT Ax  =  yT Ax ,  and  so  this  last 
inequality  is  exactly  the  condition  that  x  G  H.  Again,  the  arbitrariness  of  x  G  P 
implies  that  P  C  H,  and  the  proof  is  complete.  □ 


4.  Farkas’  Lemma 

The  following  result,  known  as  Farkas’  Lemma,  played  a  fundamental  role  in 
the  proof  of  the  separation  theorem  of  the  preceding  section  (Theorem  10.4).  In  this 
section,  we  state  it  formally  as  a  lemma  and  give  its  proof. 

Lemma  10.5.  The  system  Ax  <  b  has  no  solutions  if  and  only  if  there  is  a  y 
such  that 

ATy  =  0 

(10.8)  y  >  0 

bT  y  <  0. 
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Proof.  Consider  the  linear  program 


maximize  0 
subject  to  Ax  <  b 


and  its  dual 

minimize  bT  y 
subject  to  ATy  =  0 

V  >  0  . 


Clearly,  the  dual  is  feasible  (just  take  y  =  0).  So  if  the  primal  is  feasible,  then  the 
dual  is  bounded.  Also,  if  the  primal  is  infeasible,  the  dual  must  be  unbounded.  That 
is,  the  primal  is  infeasible  if  and  only  if  the  dual  is  unbounded.  To  finish  the  proof, 
we  claim  that  the  dual  is  unbounded  if  and  only  if  there  exists  a  solution  to  (10.8). 
Indeed,  suppose  that  the  dual  is  unbounded.  The  dual  simplex  method  is  guaranteed 
to  prove  that  it  is  unbounded,  and  it  does  so  as  follows.  At  the  last  iteration,  a  step 
direction  Ay  is  computed  that  preserves  feasibility,  i.e., 


ATAy  =  0, 


is  a  descent  direction  for  the  objective  function,  i.e., 

bT Ay  <  0, 


and  is  a  direction  for  which  the  step  length  is  unbounded,  i.e., 

Ay  >  0. 

But  these  three  properties  show  that  Ay  is  the  solution  to  (10.8)  that  we  were  looking 
for.  Conversely,  suppose  that  there  is  a  solution  to  (10.8).  Call  it  Ay.  It  is  easy  to 
see  that  starting  from  y  =  0,  this  step  direction  provides  an  unbounded  decrease  in 
the  objective  function.  This  completes  the  proof.  □ 


5.  Strict  Complementarity 

In  this  section,  we  consider  the  usual  inequality-form  linear  programming  prob¬ 
lem,  which  we  write  with  its  slack  variables  shown  explicitly: 

maximize  cTx 

(10.9)  subject  to  Ax  -\-w=b 

x,  w  >  0. 

As  we  know,  the  dual  can  be  written  as  follows: 

minimize  bTy 

(10.10)  subject  to  ATy  —  z  =  c 

y,z>  o. 

In  our  study  of  duality  theory  in  Chapter  5,  we  saw  that  every  pair  of  optimal  solu¬ 
tions  to  these  two  problems  possesses  a  property  called  complementary  slackness.  If 
(x* ,  iE*)  denotes  an  optimal  solution  to  the  primal  and  (?/*,  z*)  denotes  an  optimal 
solution  to  the  dual,  then  the  complementary  slackness  theorem  says  that,  for  each 
j  =  1,  2, . . . ,  n,  either  x *  =  0  or  z*  =0  (or  both)  and,  for  each  i  =  1,  2, . . . ,  m, 
either  y*  =  0  or  w*  =0  (or  both).  In  this  section,  we  shall  prove  that  there  are 
optimal  pairs  of  solutions  for  which  the  parenthetical  “or  both”  statements  don’t 
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happen.  That  is,  there  are  optimal  solutions  for  which  exactly  one  member  of  each 
pair  (xp  Zj)  vanishes  and  exactly  one  member  from  each  pair  (y * ,  w* )  vanishes. 
In  such  cases,  we  say  that  the  optimal  solutions  are  strictly  complementary  to  each 
other.  The  strictness  of  the  complementary  slackness  is  often  expressed  by  saying 
that  x*  +  z*  >  0  and  y*  +  w*  >  0.2 

As  a  warm-up,  we  prove  the  following  theorem. 

Theorem  10.6.  If  both  the  primal  and  the  dual  have  feasible  solutions,  then 
there  exists  a  primal  feasible  solution  (x,  w)  and  a  dual  feasible  solution  (y,  z)  such 
that  x  +  z  >  0  and  y  +  w  >  0. 

Proof.  If  there  is  a  feasible  primal  solution  x  for  which  x3  >  0,  then  it  doesn’t 
matter  whether  there  is  a  feasible  dual  solution  whose  jth  slack  variable  is  strictly 
positive.  But  what  about  indices  j  for  which  Xj  =  0  for  every  feasible  solution?  Let 
j  be  such  an  index.  Consider  the  following  linear  programming  problem: 

maximize  Xj 

(10.11)  subjectto  Ax  <  b 

x  >  0. 

This  problem  is  feasible,  since  its  constraints  are  the  same  as  for  the  original  primal 
problem  (10.9).  Furthermore,  it  has  an  optimal  solution  (the  corresponding  objective 
function  value  is  zero).  The  dual  of  (10.11)  is: 

minimize  bT  y 

m 

subject  to  A  y>  ej 

y>  o. 

By  the  strong  duality  theorem,  the  dual  has  an  optimal  solution,  say  y'.  Letting  z' 
denote  the  corresponding  slack  variable,  we  have  that 

ATy'  —  z'  =  ej 

y',z'  >  0. 

Now,  let  y  be  any  feasible  solution  to  (10.10)  and  let  z  be  the  corresponding  slack 
variable.  Then  the  above  properties  of  y'  and  z'  imply  that  y  -j-  y'  is  feasible  for 
(10.10)  and  its  slack  is  z  +  z'  +  ej .  Clearly,  for  this  dual  feasible  solution  we  have 
that  the  jth  component  of  its  vector  of  slack  variables  is  at  least  1.  To  summa¬ 
rize,  we  have  shown  that,  for  each  j,  there  exists  a  primal  feasible  solution,  call  it 

and  a  dual  feasible  solution,  call  it  (y^\  z^ ),  such  that  x^+z^  >  0. 
In  the  same  way,  one  can  exhibit  primal  and  dual  feasible  solutions  for  which  each 
individual  dual  variable  and  its  corresponding  primal  slack  add  to  a  positive  number. 
To  complete  the  proof,  we  now  form  a  strict  convex  combination  of  these  n  +  m 
feasible  solutions.  Since  the  feasible  region  for  a  linear  programming  problem  is 
convex,  these  convex  combinations  preserve  primal  and  dual  feasibility.  Since  the 
convex  combination  is  strict,  it  follows  that  every  primal  variable  and  its  dual  slack 
add  to  a  strictly  positive  number  as  does  every  dual  variable  and  its  primal  slack.  □ 

Given  any  vector  £,  we  use  the  notation  £  >  0  to  indicate  that  every  component  of  £  is  strictly 
positive:  £j  >  0  for  all  j. 
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A  variable  Xj  that  must  vanish  in  order  for  a  linear  programming  problem  to 
be  feasible  is  called  a  null  variable.  The  previous  theorem  says  that  if  a  variable  is 
null,  then  its  dual  slack  is  not  null. 

The  following  theorem  is  called  the  Strict  Complementary  Slackness  Theorem 

THEOREM  10.7.  If  a  linear  programming  problem  has  an  optimal  solution ,  then 
there  is  an  optimal  solution  (#*,  w*)  and  an  optimal  dual  solution  (y*,  z*)  such  that 

x*  z*  >  0  and  y*  +  w*  >  0. 

We  already  know  from  the  complementary  slackness  theorem  (Theorem  5.1) 
that  x *  and  z*  are  complementary  to  each  other  as  are  y*  and  w*.  This  theorem 
then  asserts  that  the  complementary  slackness  is  strict. 


Proof.  The  proof  is  much  the  same  as  the  proof  of  Theorem  10.6  except  this 
time  we  look  at  an  index  j  for  which  Xj  vanishes  in  every  optimal  solution.  We  then 
consider  the  following  problem: 


(10.12) 


maximize  Xj 
subject  to  Ax  <  b 

cTx  >  C* 
x  >  0, 


where  (*  denotes  the  objective  value  of  the  optimal  solution  to  the  original  problem. 
In  addition  to  the  dual  variables  y  corresponding  to  the  Ax  <  b  constraints,  there 
is  one  more  dual  variable,  call  it  t,  associated  with  the  constraint  cTx  >  £*.  The 
analysis  of  problem  (10.12)  is  similar  to  the  analysis  given  in  Theorem  10.6  except 
that  one  must  now  consider  two  cases:  (a)  the  optimal  value  of  t  is  strictly  positive 
and  (b)  the  optimal  value  of  t  vanishes.  The  details  are  left  as  an  exercise  (see 
Exercise  10.6).  □ 


Exercises 


10.1  Is  Mn  a  polyhedron? 

10.2  For  each  b  E  Mm,  let  £*(&)  denote  the  optimal  objective  function  value 
for  the  following  linear  program: 

maximize  cTx 
subject  to  Ax  <  b 

x  >  0. 

Suppose  that  (b)  <  oo  for  all  b.  Show  that  the  function  (b)  is  concave 
(a  function  /  on  Mm  is  called  concave  if  f(tx  +  (1  —  t)y)  >  tf(x )  +  (1  — 
t)f(y)  for  all  X  and  y  in  Mm  and  all  0  <  t  <  1).  Hint:  Consider  the  dual 
problem. 

10.3  Describe  how  one  needs  to  modify  the  proof  of  Theorem  10.4  to  get  a 
proof  of  the  following  result: 
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Let  P  and  P  be  two  disjoint  polyhedra  in  Mn.  Then  there  exist 
disjoint  generalized  half  spaces  H  and  H  such  that  P  c  H  and 
Pci 


Find  a  strictly  complementary  solution 

to 

the  following  linear  program 

ming  problem  and  its  dual: 

maximize 

2^i  + 

x2 

subject  to 

4xi  + 

2x2 

< 

6 

X2 

< 

1 

2^i  -T 

X2 

< 

3 

Xi, 

X2 

> 

0  . 

10.5  There  is  a  slight  oversimplification  in  the  proof  of  Theorem  10.3.  Can  you 
spot  it?  Can  you  fix  it? 

10.6  Complete  the  proof  of  Theorem  10.7. 

10.7  Interior  solutions.  Prove  the  following  statement:  If  a  linear  programming 
problem  has  feasible  solutions  and  the  set  of  feasible  solutions  is  bounded, 
then  there  is  a  strictly  positive  dual  feasible  solution:  y  >  0  and  z  >  0. 
Hint.  It  is  easier  to  prove  the  equivalent  statement:  if  a  linear  program¬ 
ming  problem  has  feasible  solutions  and  the  dual  has  null  variables,  then 
the  set  of  primal  feasible  solutions  is  an  unbounded  set. 

Notes 

Caratheodory  (1907)  proved  Theorem  10.3.  Farkas  (1902)  proved  Lemma  10.5. 
Several  similar  results  were  discovered  by  many  others,  including  Gordan  (1873), 
Stiemke  (1915),  Ville  (1938),  and  Tucker  (1956).  The  standard  reference  on  convex 
analysis  is  Rockafellar  (1970). 


CHAPTER  11 


Game  Theory 


In  this  chapter,  we  shall  study  if  not  the  most  practical  then  certainly  an  elegant 
application  of  linear  programming.  The  subject  is  called  game  theory,  and  we  shall 
focus  on  the  simplest  type  of  game,  called  the  finite  two-person  zero-sum  game ,  or 
just  matrix  game  for  short.  Our  primary  goal  shall  be  to  prove  the  famous  Minimax 
Theorem,  which  was  first  discovered  and  proved  by  John  von  Neumann  in  1928. 
His  original  proof  of  this  theorem  was  rather  involved  and  depended  on  another 
beautiful  theorem  from  mathematics,  the  Brouwer  Fixed-Point  Theorem.  However, 
it  eventually  became  clear  that  the  solution  of  matrix  games  could  be  found  by 
solving  a  certain  linear  programming  problem  and  that  the  Minimax  Theorem  is 
just  a  fairly  straightforward  consequence  of  the  Duality  Theorem. 

1.  Matrix  Games 

A  matrix  game  is  a  two-person  game  defined  as  follows.  Each  person  first 
selects,  independently  of  the  other,  an  action  from  a  finite  set  of  choices  (the  two 
players  in  general  will  be  confronted  with  different  sets  of  actions  from  which  to 
choose).  Then  both  reveal  to  each  other  their  choice.  If  we  let  i  denote  the  first 
player’s  choice  and  j  denote  the  second  player’s  choice,  then  the  rules  of  the  game 
stipulate  that  the  first  player  will  pay  the  second  player  dollars.  The  array  of 
possible  payments 

A  —  Oij 

is  presumed  known  to  both  players  before  the  game  begins.  Of  course,  if  ai;j  is 
negative  for  some  pair  (i,  j),  then  the  payment  goes  in  the  reverse  direction — from 
the  second  player  to  the  first.  For  obvious  reasons,  we  shall  refer  to  the  first  player  as 
the  row  player  and  the  second  player  as  the  column  player .  Since  we  have  assumed 
that  the  row  player  has  only  a  finite  number  of  actions  from  which  to  choose,  we 
can  enumerate  these  actions  and  assume  without  loss  of  generality  that  i  is  simply 
an  integer  selected  from  1  to  m.  Similarly,  we  can  assume  that  j  is  simply  an  index 
ranging  from  1  to  n  (in  its  real-world  interpretation,  row  action  3  will  generally 
have  nothing  to  do  with  column  action  3 — the  number  3  simply  indicates  that  it  is 
the  third  action  in  the  enumerated  list  of  choices). 

Let  us  look  at  a  specific  familiar  example.  Namely,  consider  the  game  every 
child  knows,  called  Paper-Scissors-Rock.  To  refresh  the  memory  of  older  readers, 
this  is  a  two-person  game  in  which  at  the  count  of  three  each  player  declares  either 
Paper,  Scissors,  or  Rock.  If  both  players  declare  the  same  object,  then  the  round  is  a 
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draw.  But  Paper  loses  to  Scissors  (since  scissors  can  cut  a  piece  of  paper),  Scissors 
loses  to  Rock  (since  a  rock  can  dull  scissors),  and  finally  Rock  loses  to  Paper  (since 
a  piece  of  paper  can  cover  up  a  rock — it’s  a  weak  argument  but  that’s  the  way  the 
game  is  defined).  Clearly,  for  this  game,  if  we  enumerate  the  actions  of  declaring 
Paper,  Scissors,  or  Rock  as  1,  2,  3,  respectively,  then  the  payoff  matrix  is 

"0  1  -1  " 

-10  1. 

1  -1  0 


With  this  matrix,  neither  player  has  an  obvious  (i.e.,  deterministic)  winning  strategy. 
If  the  column  player  were  always  to  declare  Paper  (hoping  that  the  row  player  will 
declare  Rock),  then  the  row  player  could  counter  by  always  declaring  Scissors  and 
guaranteeing  herself  a  winning  of  one  dollar  in  every  round.  In  fact,  if  the  column 
player  were  to  stick  to  any  specific  declaration,  then  the  row  player  would  eventually 
get  wise  to  it  and  respond  appropriately  to  guarantee  that  she  wins.  Of  course,  the 
same  logic  applies  to  the  row  player.  Hence,  neither  player  should  employ  the  same 
declaration  over  and  over.  Instead,  they  should  randomize  their  declarations.  In  fact, 
due  to  the  symmetry  of  this  particular  game,  both  players  should  make  each  of  the 
three  possible  declarations  with  equal  likelihood. 

But  what  about  less  trivial  games?  For  example,  suppose  that  the  payoffs  in  the 
Paper-Scissors-Rock  game  are  altered  so  that  the  payoff  matrix  becomes 


A 


0  1  -2 
-3  0  4 

5-6  0 


This  new  game  still  has  the  property  that  every  deterministic  strategy  can  be  foiled 
by  an  intelligent  opponent.  Hence,  randomized  behavior  remains  appropriate.  But 
the  best  probabilities  are  no  longer  uniformly  1/3.  Also,  who  has  the  edge  in  this 
game?  Since  the  total  of  the  payoffs  that  go  from  the  row  player  to  the  column  player 
is  10  whereas  the  total  of  the  payoffs  that  go  to  the  row  player  is  11,  we  suspect  that 
the  row  player  might  have  the  edge.  But  this  is  just  a  guess.  Is  it  correct?  If  it  is 
correct,  how  much  can  the  row  player  expect  to  win  on  average  in  each  round?  If 
the  row  player  knows  this  number  accurately  and  the  column  player  does  not,  then 
the  row  player  could  offer  to  pay  the  column  player  a  small  fee  for  playing  each 
round.  If  the  fee  is  smaller  than  the  expected  winnings,  then  the  row  player  can  still 
be  confident  that  over  time  she  will  make  a  nice  profit.  The  purpose  of  this  chapter 
is  to  answer  these  questions  precisely. 

Let  us  return  now  to  the  general  setup.  Consider  the  row  player.  By  a  random¬ 
ized  strategy ,  we  mean  that,  at  each  play  of  the  game,  it  appears  (from  the  column 
player’s  viewpoint)  that  the  row  player  is  making  her  choices  at  random  according 
to  some  fixed  probability  distribution.  Let  yi  denote  the  probability  that  the  row 
player  selects  action  i.  The  vector  y  composed  of  these  probabilities  is  called  a  sto¬ 
chastic  vector.  Mathematically,  a  vector  is  a  stochastic  vector  if  it  has  nonnegative 
components  that  sum  up  to  one: 

y  >  0  and  eTy  =  1, 
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where  e  denotes  the  vector  consisting  of  all  ones.  Of  course,  the  column  player  must 
also  adopt  a  randomized  strategy.  Let  Xj  denote  the  probability  that  the  column 
player  selects  action  j,  and  let  x  denote  the  stochastic  vector  composed  of  these 
probabilities. 

The  expected  payoff  to  the  column  player  is  computed  by  summing  over  all 
possible  outcomes  the  payoff  associated  with  that  outcome  times  the  probability 
of  the  outcome.  The  set  of  possible  outcomes  is  simply  the  set  of  pairs  (i,  j)  as 
i  ranges  over  the  row  indices  (1,  2, . . . ,  m)  and  j  ranges  over  the  column  indices 
(1,  2, . . . ,  n).  For  outcome  (i,j)  the  payoff  is  a^-,  and,  assuming  that  the  row  and 
column  players  behave  independently,  the  probability  of  this  outcome  is  simply 
yiXj.  Hence,  the  expected  payoff  to  the  column  player  is 

ymjXj  =  yT  Ax. 

hJ 


2.  Optimal  Strategies 


Suppose  that  the  column  player  adopts  strategy  x  (i.e.,  decides  to  play  in  accor¬ 
dance  with  the  stochastic  vector  x).  Then  the  row  player’s  best  defense  is  to  use  the 
strategy  y*  that  achieves  the  following  minimum: 

minimize  yT  Ax 

rjn 

(11.1)  subjectto  e  y  =  1 

y>  o. 


From  the  fundamental  theorem  of  linear  programming,  we  know  that  this  problem 
has  a  basic  optimal  solution.  For  this  problem,  the  basic  solutions  are  simply  y 
vectors  that  are  zero  in  every  component  except  for  one,  which  is  one.  That  is,  the 
basic  optimal  solutions  correspond  to  deterministic  strategies.  This  is  fairly  obvious 
if  we  look  again  at  our  example.  Suppose  that 


x  = 


1/3 

1/3 

1/3 


Then 


Ax  = 


1/3 

1/3 

1/3 


and  so  the  row  player’s  best  choice  is  to  select  either  i  —  1  (Paper)  or  i  =  3  (Rock) 
or  any  combination  thereof.  That  is,  an  optimal  solution  is  y*  =  (1, 0,  0)  (it  is  not 
unique). 

Since  for  any  given  x  the  row  player  will  adopt  the  strategy  that  achieves  the 
minimum  in  (11.1),  it  follows  that  the  column  player  should  employ  a  strategy  x* 
that  attains  the  following  maximum: 


(11.2)  max  min  yT Ax, 

x  y 

where  the  max  and  the  min  are  over  all  stochastic  vectors  (of  the  appropriate 
dimension). 
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The  question  then  becomes:  how  do  we  solve  (11.2)?  It  turns  out  that  this 
problem  can  be  reformulated  as  a  linear  programming  problem.  Indeed,  we  have 
already  seen  that  the  inner  optimization  (the  minimization)  can  be  taken  over  just 
the  deterministic  strategies: 

rri  rri 

min  y  Ax  =  min  e \  Ax , 

V  i 

where  we  have  used  to  denote  the  vector  of  all  zeros  except  for  a  one  in  position 
i.  Hence,  the  max-min  problem  given  in  (1 1.2)  can  be  rewritten  as 

maximize  (min^  ej Ax) 

n 

subject  to  Xj  =  1 

3  = 1 

Xj  >  0  j  =  1,  2, . . . ,  n. 

Now,  if  we  introduce  a  new  variable,  v,  representing  a  lower  bound  on  the  ef  Ax' s, 
then  we  see  that  the  problem  can  be  recast  as  a  linear  program: 

maximize  v 

subject  to  v  <  ej  Ax  i  =  1,  2, . . . ,  m 

n 

Y,  x'i  = 1 

3  = 1 

Xj  >  0  j  =  1,  2, . . . ,  n. 

Switching  back  to  vector  notation,  the  problem  can  be  written  as 

maximize  v 

subject  to  ve  —  Ax  <  0 

eTx  =  1 
x  >  0. 


Finally,  writing  in  block-matrix  form,  we  get 


(11.3) 


maximize 
subject  to 


0  1] 


-A 

e 

X 

< 

"o' 

eT 

0 

V 

— 

1 

x  >  0 

v  free. 


Now  let’s  turn  it  around.  By  symmetry,  the  row  player  seeks  a  strategy  y*  that 
attains  optimality  in  the  following  min-max  problem: 

m 

min  ma xy  Ax, 

y  x 

which  can  be  reformulated  as  the  following  linear  program: 

minimize  u 

m 

subject  to  ue  —  A  y  >  0 

eTy  =  1 

y>  o. 
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Writing  in  block-matrix  form,  we  get 


minimize 


0 


i] 


y 


u 


(11.4) 

subject  to 

—AT  e 
eT  0 

y 

u 

> 

'O' 

1 

y  >  0 

' 

u  free. 


3.  The  Minimax  Theorem 

Having  reduced  the  computation  of  the  optimal  strategies  x*  and  y*  to  the  so¬ 
lution  of  linear  programs,  it  is  now  a  simple  matter  to  show  that  they  are  consistent 
with  each  other.  The  next  theorem,  which  establishes  this  consistency,  is  called  the 
Minimax  Theorem: 

THEOREM  11.1.  There  exist  stochastic  vectors  x *  and  y *  for  which 

max  y*T  Ax  =  min  yT Ax* . 

x  y 

Proof.  The  proof  follows  trivially  from  the  observation  that  (1 1.4)  is  the  dual 
of  (11.3).  Therefore,  v *  =  u* .  Furthermore, 

v *  =  min  ej  Ax*  =  minyT  Ax* , 

i  y 

and  similarly, 

u*  =  max  T T  y*  =  mdcxxT  ATy*  =  ma  xy*T  Ax. 

rt  ry»  ry> 


The  common  optimal  value  v *  —  u*  of  the  primal  and  dual  linear  programs  is 
called  the  value  of  the  game.  From  the  Minimax  Theorem,  we  see  that,  by  adopting 
strategy  y  * ,  the  row  player  assures  herself  of  losing  no  more  than  v  units  per  round 
on  the  average.  Similarly,  the  column  player  can  assure  himself  of  winning  at  least  v 
units  per  round  on  the  average  by  adopting  strategy  x* .  A  game  whose  value  is  zero 
is  therefore  a  fair  game.  Games  where  the  roles  of  the  two  players  are  interchange¬ 
able  are  clearly  fair.  Such  games  are  called  symmetric.  They  are  characterized  by 
payoff  matrices  having  the  property  that  for  all  i  and  j  (in  particular,  m 

must  equal  n  and  the  diagonal  must  vanish). 

For  the  Paper-Scissors-Rock  game,  the  linear  programming  problem  that  the 
column  player  needs  to  solve  is 


maximize 
subject  to 


v 


0 

3 

-5 

1 


-1  2 
0  -4 
6  0 
1  1 


1 

1 

1 

0 


X\ 

< 

"0" 

X2 

< 

0 

X3 

< 

0 

V 

— 

1 

X\  ,  X2-,  X3  >  0 


v  free. 
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In  nonmatrix  notation,  it  looks  like  this: 

maximize  v 


subject  to 

— x2 

+ 

2x3 

+  V 

< 

0 

3xi 

— 

4x3 

+  V 

< 

0 

— 5xi  + 

6x2 

+  V 

< 

0 

X\  + 

x2 

+ 

X3 

— 

1 

Xi, 

1  ^2  5 

x3 

> 

0 

This  linear  programming  problem  deviates  from  our  standard  inequality  form  in 
two  respects:  (1)  it  has  an  equality  constraint  and  (2)  it  has  a  free  variable.  There 
are  several  ways  in  which  one  can  convert  this  problem  into  standard  form.  The 
most  compact  conversion  is  as  follows.  First,  use  the  equality  constraint  to  solve 
explicitly  for  one  of  the  x/ s,  say  x3: 

X3  =  1  —  X\  —  .X‘2  • 

Then  eliminate  this  variable  from  the  remaining  equations  to  get 

maximize  v 


subject  to  —  2xi 

— 

3x2 

+  V 

< 

-2 

T— H 

+ 

4x2 

+  V 

< 

4 

— 5xi 

+ 

6x2 

+  V 

< 

0 

X\ 

+ 

x2 

< 

1 

Xl, 

X2 

> 

0 

The  elimination  of  x3  has  changed  the  last  constraint  from  an  equality  into  an  in¬ 
equality. 

The  next  step  is  to  write  down  a  starting  dictionary.  To  do  this,  we  need  to 
introduce  slack  variables  for  each  of  these  constraints.  It  is  natural  (and  desirable) 
to  denote  the  slack  variable  for  the  last  constraint  by  x3.  In  fact,  doing  this,  we  get 
the  following  starting  dictionary: 

g  = _ v_ 

X4  =  —  2  +  2xi  +  3^2  —  v 

x,5  =  4  —  7xi  —  4x2  —  v 

xq  =  5xi  —  6x2  —  v 

X3  =  1  —  X\  —  X2  . 

The  variable  v  is  not  constrained  to  be  nonnegative.  Therefore,  there  is  no  reason 
for  it  to  be  nonbasic.  Let  us  do  an  arbitrary  pivot  with  v  as  the  entering  variable  and 
any  basic  variable  as  the  leaving  variable  (well,  not  exactly  any — we  must  make 
sure  that  it  causes  no  division  by  0,  so  therefore  X3  is  not  a  candidate).  Picking  X4 
to  leave,  we  get 

£  — 2  2xi  +  3x2  —  X4 

v  =  — 2  -j-  2xi  +  3x2  —  X4 

X5  =  6  —  9xi  —  7x2  +  X4 

Xq  =  2  +  3Xi  —  9X2  +  X4 

X3  =  1  —  X\  —  X‘2  • 


4.  POKER 


157 


Since  v  is  free  of  sign  constraints,  it  will  never  leave  the  basis  (since  a  leaving 
variable  is,  by  definition,  a  variable  that  hits  its  lower  bound — v  has  no  such  bound). 
Therefore,  we  may  as  well  remove  it  from  the  dictionary  altogether;  it  can  always 
be  computed  at  the  end.  Hence,  we  note  that 

v  =  —  2  +  2xi  +  3^2  —  X4, 


or  better  yet  that 

and  the  dictionary  now  becomes 


v  =  €, 


£  —  — 2  ~f-  2x\  +  3x2  —  X4 

X5  =  6  —  9xi  —  7x2  +  X4 

xq  =  2  +  3xi  —  9x2  +  X4 

X3  =  1  —  X\  —  X‘2  • 


At  last,  we  are  in  a  position  to  apply  the  simplex  method.  Two  (tedious)  iterations 
bring  us  to  the  optimal  dictionary.  Since  it  involves  fractions,  we  multiply  each 
equation  by  an  integer  to  make  each  number  in  the  dictionary  an  integer.  Indeed, 
after  multiplying  by  102,  the  optimal  dictionary  is  given  by 

102£  =  —16  —  27x5  —  13x6  —  62x4 

102xi  =  40  —  9x5  +  7x6  +  2x4 

102x2  =  36  —  3x5  —  9x6  +  12x4 

102x3  =  26  +  12x5  +  2x6  —  14x4  • 


From  this  dictionary,  it  is  easy  to  read  off  the  optimal  primal  solution: 


40/102 

36/102 

26/102 


Also,  since  X4,  X5,  and  X6  are  complementary  to  yi,y2,  and  y%  in  the  dual  problem, 
the  optimal  dual  solution  is 


y 


* 


62/102 

27/102 

13/102 


Finally,  the  value  of  the  game  is 


v*  =  C  =  -16/102  =  -0.15686275, 


which  indicates  that  the  row  player  does  indeed  have  an  advantage  and  can  expect 
to  make  on  the  average  close  to  16  cents  per  round. 


4.  Poker 

Some  card  games  such  as  poker  involve  a  round  of  bidding  in  which  the  play¬ 
ers  at  times  bluff  by  increasing  their  bid  in  an  attempt  to  coerce  their  opponents 
into  backing  down,  even  though  if  the  challenge  is  accepted  they  will  surely  lose. 
Similarly,  they  will  sometimes  underbid  to  give  their  opponents  false  hope.  In  this 
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section,  we  shall  study  a  simplified  version  of  poker  (the  real  game  is  too  hard  to 
analyze)  to  see  if  bluffing  and  underbidding  are  justified  bidding  strategies. 

Simplified  poker  involves  two  players,  A  and  B,  and  a  deck  having  three  cards, 
1,  2,  and  3.  At  the  beginning  of  a  round,  each  player  “antes  up”  $1  and  is  dealt  one 
card  from  the  deck.  A  bidding  session  follows  in  which  each  player  in  turn,  starting 
with  A,  either  (a)  bets  and  adds  $1  to  the  “kitty”  or  (b)  passes.  Bidding  terminates 
when 


a  bet  is  followed  by  a  bet, 
a  pass  is  followed  by  a  pass,  or 
a  bet  is  followed  by  a  pass. 

In  the  first  two  cases,  the  winner  of  the  round  is  decided  by  comparing  cards,  and 
the  kitty  goes  to  the  player  with  the  higher  card.  In  the  third  case,  bet  followed  by 
pass,  the  player  who  bet  wins  the  round  independently  of  who  had  the  higher  card 
(in  real  poker,  the  player  who  passes  is  said  to  fold). 

With  these  simplified  betting  rules,  there  are  only  five  possible  betting 
scenarios: 


A  passes, 

B  passes: 

$  1  to  holder  of  higher  card 

A  passes, 

B  bets,  A  passes: 

$1  to  B 

A  passes, 

B  bets,  A  bets: 

$2  to  holder  of  higher  card 

A  bets, 

B  passes: 

$1  to  A 

A  bets, 

B  bets: 

$2  to  holder  of  higher  card 

After  being  dealt  a  card,  player  A  will  decide  to  bet  along  one  of  three  lines: 

1.  Pass.  If  B  bets,  pass  again. 

2.  Pass.  If  B  bets,  bet. 

3.  Bet. 

Similarly,  after  being  dealt  a  card,  player  B  can  bet  along  one  of  four  lines: 

1.  Pass  no  matter  what. 

2.  If  A  passes,  pass,  but  if  A  bets,  bet. 

3.  If  A  passes,  bet,  but  if  A  bets,  pass. 

4.  Bet  no  matter  what. 

To  model  the  situation  as  a  matrix  game,  we  must  identify  each  player’s  pure  strate¬ 
gies.  A  pure  strategy  is  a  statement  of  what  line  of  betting  a  player  intends  to  follow 
for  each  possible  card  that  the  player  is  dealt.  Hence,  the  players’  pure  strategies 
can  be  denoted  by  triples  (yi,  y2, 2/3) ,  where  ip  is  the  line  of  betting  that  the  player 
will  use  when  holding  card  i.  (For  player  A,  the  yi  s  can  take  values  1,  2,  and  3, 
whereas  for  player  B,  they  can  take  values  1,  2,  3,  and  4.) 

Given  a  pure  strategy  for  both  players,  one  can  compute  the  average  payment 
from,  say,  A  to  B.  For  example,  suppose  that  player  A  adopts  strategy  (3, 1, 2)  and 
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player  B  adopts  strategy  (3,  2,4).  There  are  six  ways  in  which  the  cards  can  be 
dealt,  and  we  can  analyze  each  of  them  as  follows: 


card  dealt 
A  B 

betting  session 

payment 
A  to  B 

1  2 

A  bets,  B  bets 

2 

1  3 

A  bets,  B  bets 

2 

2  1 

A  passes,  B  bets,  A  passes 

1 

2  3 

A  passes,  B  bets,  A  passes 

1 

3  1 

A  passes,  B  bets,  A  bets 

-2 

3  2 

A  passes,  B  passes 

-1 

Since  each  of  the  six  deals  are  equally  likely,  the  average  payment  from  A  to  B  is 
(2 +  2  +  1  +  1  —  2  —  l)/6  =  0.5. 

The  calculation  of  the  average  payment  must  be  carried  out  for  every  combi¬ 
nation  of  pairs  of  strategies.  How  many  are  there?  Player  A  has  3x3x3  =  27 
pure  strategies  and  player  B  has  4  x  4  x  4  =  64  pure  strategies.  Hence,  there  are 
27  x  64  =  1,728  pairs.  Calculating  the  average  payment  for  all  these  pairs  is  a  daunt¬ 
ing  task.  Fortunately,  we  can  reduce  the  number  of  pure  strategies  (and  hence  the 
number  of  pairs)  that  need  to  be  considered  by  making  a  few  simple  observations. 

The  first  observation  is  that  a  player  holding  a  1  should  never  answer  a  bet  with 
a  bet,  since  the  player  will  lose  regardless  of  the  answering  bet  and  will  lose  less  by 
passing.  This  logic  implies  that,  when  holding  a  1, 

player  A  should  refrain  from  betting  along  line  2; 
player  B  should  refrain  from  betting  along  lines  2  and  4. 

More  clearly  improvable  strategies  can  be  ruled  out  when  holding  the  highest 
card.  For  example,  a  player  holding  a  3  should  never  answer  a  bet  with  a  pass,  since 
by  passing  the  player  will  lose,  but  by  betting  the  player  will  win.  Furthermore, 
when  holding  a  3,  a  player  should  always  answer  a  pass  with  a  bet,  since  in  either 
case  the  player  is  going  to  win,  but  answering  with  a  bet  opens  the  possibility  of 
the  opponent  betting  again  and  thereby  increasing  the  size  of  the  win  for  the  player 
holding  the  3.  Hence,  when  holding  a  3, 

player  A  should  refrain  from  betting  along  line  1; 
player  B  should  refrain  from  betting  along  lines  1,2,  and  3. 

Eliminating  from  consideration  the  above  lines  of  betting,  we  see  that  player 
A  now  has  2x3x2  =  12  pure  strategies  and  player  B  has  2  x  4  x  1  =  8  pure 
strategies.  The  number  of  pairs  has  therefore  dropped  to  96 — a  significant  reduction. 
Not  only  do  we  eliminate  these  “bad”  strategies  from  the  mathematical  model  but 
also  we  assume  that  both  players  know  that  these  bad  strategies  will  not  be  used. 
That  is,  player  A  can  assume  that  player  B  will  play  intelligently,  and  player  B 
can  assume  the  same  of  A.  This  knowledge  then  leads  to  further  reductions.  For 
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example,  when  holding  a  2,  player  A  should  refrain  from  betting  along  line  3.  To 
reach  this  conclusion,  we  must  carefully  enumerate  possibilities.  Since  player  A 
holds  the  2,  player  B  holds  either  the  1  or  the  3.  But  we’ve  already  determined  what 
player  B  will  do  in  both  of  those  cases.  Using  this  knowledge,  it  is  not  hard  to  see 
that  player  A  would  be  unwise  to  bet  along  line  3.  A  similar  analysis  reveals  that, 
when  holding  a  2,  player  B  should  refrain  from  lines  3  and  4.  Therefore,  player  A 
now  has  only  2  x  2  x  2  =  8  pure  strategies  and  player  B  has  only  2  x  2  x  1  =  4 
pure  strategies. 

At  this  point,  no  further  reductions  are  possible.  Computing  the  payoff  matrix, 
we  get 


A  = 


(1,1,2) 

(1.1.3) 

(1,2,2) 

(1.2.3) 

(3.1.2) 

(3.1.3) 

(3.2.2) 

(3.2.3) 


(1,1,4)  (1,2,4) 


1 

6 

1 

6 

1 

6 

1 

6 


1 

6 

1 

6 


1 

3 

1 

6 

1 

2 

1 

3 


(3,1,4) 

1 

6 

1 

3 

1 

6 


1 

6 

1 

3 

1 

6 


(3,2,4) 

1 

6 

1 

6 

1 

6 

1 

6 

1 

2 

1 

2 

1 

6 

1 

6 


Solving  the  matrix  game,  we  find  that 


* 

y  = 


L  \  o  o  |  o  o  o  |  ]T 


and 


x 


* 


f  0  0  i 


1 


These  stochastic  vectors  can  be  summarized  as  simple  statements  of  the  optimal 
randomized  strategies  for  the  two  players.  Indeed,  player  A’s  optimal  strategy  is  as 
follows: 


when  holding  1,  mix  lines  1  and  3  in  5:1  proportion; 
when  holding  2,  mix  lines  1  and  2  in  1:1  proportion; 
when  holding  3,  mix  lines  2  and  3  in  1:1  proportion. 

Similarly,  player  B’s  optimal  strategy  can  be  described  as 

when  holding  1,  mix  lines  1  and  3  in  2:1  proportion; 
when  holding  2,  mix  lines  1  and  2  in  2:1  proportion; 
when  holding  3,  use  line  4. 
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Note  that  it  is  optimal  for  player  A  to  use  line  3  when  holding  a  1  at  least  some  of 
the  time.  Since  line  3  says  to  bet,  this  bet  is  a  bluff.  Player  B  also  bluffs  some¬ 
times,  since  betting  line  3  is  sometimes  used  when  holding  a  1.  Clearly,  the  optimal 
strategies  also  exhibit  some  underbidding. 

Exercises 

11.1  Players  A  and  B  each  hide  a  nickel  or  a  dime.  If  the  hidden  coins  match, 
player  A  gets  both;  if  they  don’t  match,  then  B  gets  both.  Find  the  opti¬ 
mal  strategies.  Which  player  has  the  advantage?  Solve  the  problem  for 
arbitrary  denominations  a  and  b. 

11.2  Players  A  and  B  each  pick  a  number  between  1  and  100.  The  game  is  a 
draw  if  both  players  pick  the  same  number.  Otherwise,  the  player  who 
picks  the  smaller  number  wins  unless  that  smaller  number  is  one  less  than 
the  opponent’s  number,  in  which  case  the  opponent  wins.  Find  the  optimal 
strategy  for  this  game. 

11.3  We  say  that  row  r  dominates  row  s  if  arj  >  aSj  for  all  j  =  1,2 , . . . ,  n. 
Similarly,  column  r  is  said  to  dominate  column  s  if  air  >  ais  for  all 
i  =  1,  2, . . . ,  mn.  Show  that 

(a)  If  a  row  (say,  r)  dominates  another  row,  then  the  row  player  has  an 
optimal  strategy  y*  in  which  y*  =  0. 

(b)  If  a  column  (say,  s)  is  dominated  by  another  column,  then  the  column 
player  has  an  optimal  strategy  x*  in  which  x*s  =  0. 

Use  these  results  to  reduce  the  following  payoff  matrix  to  a  2  x  2  matrix: 


6 

to 

-4 

-7 

-5 

0 

4 

to 

-9 

-1 

7 

CO 

CO 

1 

00 

-2 

to 

CO 

6 

0 

CO 

11.4  Solve  simplified  poker  assuming  that  antes  are  $2  and  bets  are  $1. 

11.5  Give  necessary  and  sufficient  conditions  for  the  rth  pure  strategy  of  the 
row  and  the  sth  pure  strategy  of  the  column  player  to  be  simultaneously 
optimal. 

11.6  Use  the  Minimax  Theorem  to  show  that 

m  rj~\ 

max  min  y  Ax  =  min  max  y  Ax. 

x  y  y  x 

11.7  Bimatrix  Games.  Consider  the  following  two-person  game  defined  in 
terms  of  a  pair  of  m  x  n  matrices  A  and  B:  if  the  row  player  selects 
row  index  i  and  the  column  player  selects  column  index  j ,  then  the  row 
player  pays  dollars  and  the  column  player  pays  bij  dollars.  Stochastic 
vectors  x*  and  y*  are  said  to  form  a  Nash  equilibrium  if 

y*T  Ax*  <  yT Ax*  for  all  y 

y*T Bx*  <  y*T Bx  for  all  x. 
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The  purpose  of  this  exercise  is  to  relate  Nash  equilibria  to  the  problem  of 
finding  vectors  x  and  y  that  satisfy 


(11.5) 


-A 

0 


y 

+ 

w 

X 

z 

yim 

XjZj 


x,  re, y,  z 


—e 

—e  ’ 

=  0,  for  all  i, 

=  0,  for  all  j, 

>  0 


(vectors  w  and  z  can  be  thought  of  as  being  defined  by  the  matrix  equal¬ 
ity).  Problem  (11.5)  is  called  a  linear  complementarity  problem. 

(a)  Show  that  there  is  no  loss  in  generality  in  assuming  that  A  and  B 
have  all  positive  entries. 

(b)  Assuming  that  A  and  B  have  all  positive  entries,  show  that,  if  (x* ,  y* ) 
is  a  Nash  equilibrium,  then 


y*T  Ax*  ’  y*T Bx* 

solves  the  linear  complementarity  problem  (11.5). 

(c)  Show  that,  if  (x',yf)  solves  the  linear  complementarity  problem 
(11.5),  then 


x 


* 


T  /  ’ 

e 1  x' 


j/_ 

eTy' 


is  a  Nash  equilibrium. 

(An  algorithm  for  solving  the  linear  complementarity  problem  is  devel¬ 
oped  in  Exercise  18.7.) 


11.8  The  Game  of  Morra.  Two  players  simultaneously  throw  out  one  or  two 
fingers  and  call  out  their  guess  as  to  what  the  total  sum  of  the  outstretched 
fingers  will  be.  If  a  player  guesses  right,  but  his  opponent  does  not,  he 
receives  payment  equal  to  his  guess.  In  all  other  cases,  it  is  a  draw. 

(a)  List  the  pure  strategies  for  this  game. 

(b)  Write  down  the  payoff  matrix  for  this  game. 

(c)  Formulate  the  row  player’s  problem  as  a  linear  programming  prob¬ 
lem.  {Hint:  Recall  that  the  row  player's  problem  is  to  minimize  the 
maximum  expected  payout.) 

(d)  What  is  the  value  of  this  game? 

(e)  Find  the  optimal  randomized  strategy. 


11.9  Heads  I  Win — Tails  You  Lose.  In  the  classical  coin-tossing  game,  player 
A  tosses  a  fair  coin.  If  it  comes  up  heads  player  B  pays  player  A  $2  but 
if  it  comes  up  tails  player  A  pays  player  B  $2.  As  a  two-person  zero- 
sum  game,  this  game  is  rather  trivial  since  neither  player  has  anything  to 
decide  (after  agreeing  to  play  the  game).  In  fact,  the  matrix  for  this  game 
is  a  1  x  1  matrix  with  only  a  zero  in  it,  which  represents  the  expected 
payoff  from  player  A  to  B. 
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Now  consider  the  same  game  with  the  following  twist.  Player  A  is 
allowed  to  peek  at  the  outcome  and  then  decide  either  to  stay  in  the  game 
or  to  bow  out.  If  player  A  bows  out,  then  he  automatically  loses  but  only 
has  to  pay  player  B  $1.  Of  course,  player  A  must  inform  player  B  of 
his  decision.  If  his  decision  is  to  stay  in  the  game,  then  player  B  has  the 
option  either  to  stay  in  the  game  or  not.  If  she  decides  to  get  out,  then  she 
loses  $1  to  player  A.  If  both  players  stay  in  the  game,  then  the  rules  are 
as  in  the  classical  game:  heads  means  player  A  wins,  tails  means  player  B 
wins. 

(a)  List  the  strategies  for  each  player  in  this  game.  (Hint:  Don’t  forget 
that  a  strategy  is  something  that  a  player  has  control  over.) 

(b)  Write  down  the  payoff  matrix. 

(c)  A  few  of  player  As  strategies  are  uniformly  inferior  to  others.  These 
strategies  can  be  ruled  out.  Which  of  player  As  strategies  can  be 
ruled  out? 

(d)  Formulate  the  row  player’s  problem  as  a  linear  programming  prob¬ 
lem.  {Hints:  (1)  Recall  that  the  row  player’s  problem  is  to  minimize 
the  maximum  expected  payout.  (2)  Don’t  include  rows  that  you  ruled 
out  in  the  previous  part.) 

(e)  Find  the  optimal  randomized  strategy. 

(f)  Discuss  whether  this  game  is  interesting  or  not. 

Notes 

The  Minimax  Theorem  was  proved  by  von  Neumann  (1928).  Important  ref¬ 
erences  include  Gale  et  al.  (1951),  von  Neumann  and  Morgenstern  (1947),  Karlin 
(1959),  and  Dresher  (1961).  Simplified  poker  was  invented  and  analyzed  by  Kuhn 
(1950).  Exercises  11.1  and  11.2  are  borrowed  from  Chvatal  (1983). 


CHAPTER  12 


Regression 


In  this  chapter,  we  shall  study  an  application  of  linear  programming  to  an  area 
of  statistics  called  regression.  As  a  specific  example,  we  shall  use  size  and  iteration- 
count  data  collected  from  a  standard  suite  of  linear  programming  problems  to  derive 
a  regression  estimate  of  the  number  of  iterations  needed  to  solve  problems  of  a 
given  size. 


1.  Measures  of  Mediocrity 


We  begin  our  discussion  with  an  example.  Here  are  the  midterm  exam  scores 
for  a  linear  programming  course: 


28,  62,  80,  84,  86,  86,  92,  95,  98. 


Let  m  denote  the  number  of  exam  scores  (i.e.,  m  =  9)  and  let  bi,  i  =  1,  2, . . . ,  m, 


denote  the  actual  scores  (arranged  in  increasing  order  as  above).  The  most  naive 
measure  of  the  “average”  score  is  just  the  mean  value,  x,  defined  by 


m 


This  is  an  example  of  a  statistic,  which,  by  definition,  is  a  function  of  a  set  of 
data.  Statistics  are  computed  so  that  one  does  not  need  to  bother  with  reporting 
large  tables  of  raw  numbers.  (Admittedly,  the  task  of  reporting  the  above  list  of 
nine  exam  scores  is  not  very  onerous,  but  this  is  just  an  example.)  Now,  suppose 
the  professor  in  question  did  not  report  the  scores  but  instead  just  gave  summary 
statistics.  Consider  the  student  who  got  an  80  on  the  exam.  This  student  surely 
didn’t  feel  great  about  this  score  but  might  have  thought  that  at  least  it  was  better 
than  average.  However,  as  the  raw  data  makes  clear,  this  student  really  did  worse 
than  average  on  the  exam  (the  professor  confesses  that  the  exam  was  rather  easy). 
In  fact,  out  of  the  nine  students,  the  one  who  got  an  80  scored  third  from  the  bottom 
of  the  class.  Furthermore,  the  student  who  scored  worst  on  the  exam  did  so  badly 
that  one  might  expect  this  student  to  drop  the  course,  thereby  making  the  80  look 
even  worse. 

Any  statistician  would,  of  course,  immediately  suggest  that  we  report  the 
median  score  instead  of  the  mean.  The  median  score  is,  by  definition,  that  score 


^Average”  is  usually  taken  as  synonymous  with  “mean”  but  in  this  section  we  shall  use  it  in  an 
imprecise  sense,  employing  other  technically  defined  terms  for  specific  meanings. 
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Figure  12.1.  The  objective  function  whose  minimum  occurs  at 
the  median. 

which  is  worse  than  half  of  the  other  scores  and  better  than  the  other  half.  In  other 
words,  the  median  x  is  defined  as 

x  =  6(m+1)/ 2  =  86. 

(Here  and  in  various  places  in  this  chapter,  we  shall  assume  that  m  is  odd  so  that 
certain  formulas  such  as  this  one  remain  fairly  simple.)  Clearly,  the  86  gives  a  more 
accurate  indication  of  what  the  average  score  on  the  exam  was. 

There  is  a  close  connection  between  these  statistical  concepts  and  optimization. 
For  example,  the  mean  x  minimizes,  over  all  real  numbers  x,  the  sum  of  the  squared 
deviations  between  the  data  points  and  x  itself.  That  is, 

m 

x  =  argmin^R  -  hf . 

i—  1 

To  verify  this  claim,  we  let  f(x)  =  —  fri)2,  differentiate  with  respect  to  x, 

and  set  the  derivative  to  zero  to  get 

m 

f{x)  =  -  bi)  =  0. 

i—  1 

r\ 

Solving  this  equation  for  the  critical  point-  x,  we  see  that 

^  m 

x  =  —  >  bi  =  x. 
m 

i—  1 

The  fact  that  this  critical  point  is  a  minimum  rather  than  a  maximum  (or  a  saddle 
point)  follows  from  the  fact  that  f"{x)  >  0  for  all  xGi 


9 

Recall  from  calculus  that  a  critical  point  is  any  point  at  which  the  derivative  vanishes  or  fails  to 

exist. 
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The  median  x  also  enjoys  a  close  connection  with  optimization.  Indeed,  it  is 
the  point  that  minimizes  the  sum  of  the  absolute  values  of  the  difference  between 
each  data  point  and  itself.  That  is, 

m 

x  =  argmin;rGR  ^  \x  -  bi  . 

2=1 

To  see  that  this  is  correct,  we  again  use  calculus.  Let 

m 

fix)  =^2\x-bi  . 

2=1 

This  function  is  continuous,  piecewise  linear,  and  convex  (see  Figure  12.1).  How¬ 
ever,  it  is  not  differentiable  at  the  data  points.  Nonetheless,  we  can  look  at  its 
derivative  at  other  points  to  see  where  it  jumps  across  zero.  The  derivative,  for 

X  <£  {61,62,  •  •  • ,6m },  is 


m 

fix)  =  y^sgnU  -  bi), 
2=1 


where 

(  1  if  x  >  0 

sgn(x)  =  <  0  if  x  =  0 

[  —  1  if  x  <  0. 

Hence,  we  see  that  the  derivative  at  x  is  just  the  number  of  data  points  to  the  left  of 
x  minus  the  number  of  data  points  to  the  right.  Clearly,  this  derivative  jumps  across 
zero  at  the  median,  implying  that  the  median  is  the  minimum. 

In  this  chapter,  we  shall  discuss  certain  generalizations  of  means  and  medi¬ 
ans  called  regressions.  At  the  end,  we  will  consider  a  specific  example  that  is  of 
particular  interest  to  us:  the  empirical  average  performance  of  the  simplex  method. 


2.  Multidimensional  Measures:  Regression  Analysis 

The  analysis  of  the  previous  section  can  be  recast  as  follows.  Given  a  “random” 
observation  6,  we  assume  that  it  consists  of  two  parts:  a  fixed,  but  unknown,  part 
denoted  by  x  and  a  random  fluctuation  about  this  fixed  part,  which  we  denote  by  e. 
Hence, 

b  =  x  +  e. 

Now,  if  we  take  several  observations  and  index  them  as  i  =  1, 2, . . . ,  m,  the  b’s  and 
the  e’s  will  vary,  but  x  is  assumed  to  be  the  same  for  all  observations.  Therefore,  we 
can  summarize  the  situation  by  writing 

bi  =  x  +  e$,  7  =  1,2,...,  m. 

We  now  see  that  the  mean  is  simply  the  value  of  x  that  minimizes  the  sum  of  the 
squares  of  the  e^’is.  Similarly,  the  median  is  the  value  of  x  that  minimizes  the  sum 
of  the  absolute  values  of  the  e^’s. 

Sometimes  one  wishes  to  do  more  than  merely  identify  some  sort  of  “average.” 
For  example,  a  medical  researcher  might  collect  blood  pressure  data  on  thousands 
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of  patients  with  the  aim  of  identifying  how  blood  pressure  depends  on  age,  obesity 
(defined  as  weight  over  height),  sex,  etc.  So  associated  with  each  observation  b  of 
a  blood  pressure  are  values  of  these  control  variables.  Let’s  denote  by  a\  the  age 
of  a  person,  a 2  the  obesity,  a3  the  sex,  etc.  Let  n  denote  the  number  of  different 
control  variables  being  considered  by  the  researcher.  In  (linear)  regression  analysis, 
we  assume  that  the  response  b  depends  linearly  on  the  control  variables.  Hence,  we 
assume  that  there  are  (unknown)  numbers  Xj ,  j  =  1,  2, . . . ,  n,  such  that 

n 

b  =  ajxj  +  e. 

3  = 1 

This  equation  is  referred  to  as  the  regression  model .  Of  course,  the  researcher  col¬ 
lects  data  from  thousands  of  patients,  and  so  the  data  items,  b  and  the  a/s,  must  be 
indexed  over  these  patients.  That  is, 

n 

bi  =  ^  ^  CLijXj  H-  6^5  2  =  1,  2, ... ,  m. 

3= 1 

If  we  let  b  denote  the  vector  of  observations,  e  the  vector  of  random  fluctuations, 
and  A  the  matrix  whose  Ah  row  consists  of  the  values  of  the  control  variables  for 
the  Ah  patient,  then  the  regression  model  can  be  expressed  in  matrix  notation  as 

(12.1)  b  =  Ax  +  e. 

In  regression  analysis,  the  goal  is  to  find  the  vector  x  that  best  explains  the 
observations  b.  Hence,  we  wish  to  pick  values  that  minimize,  in  some  sense,  the 
vector  e’s.  Just  as  for  the  mean  and  median,  we  can  consider  minimizing  either 
the  sum  of  the  squares  of  the  e^’s  or  the  sum  of  the  absolute  values  of  the  e$’s.  There 
are  even  other  possibilities.  In  the  next  two  sections,  we  will  discuss  the  range  of 
possibilities  and  then  give  specifics  for  the  two  mentioned  above. 

3.  L 2 -Regression 

There  are  several  notions  of  the  size  of  a  vector.  The  most  familiar  one  is  the 
Euclidean  length 

i 

This  notion  of  length  corresponds  to  our  physical  notion  (at  least  when  the  dimen¬ 
sion  is  low,  such  as  1,  2,  or  3).  However,  one  can  use  any  power  inside  the  sum 
as  long  as  the  corresponding  root  accompanies  it  on  the  outside  of  the  sum.  For 
1  <  p  <  00,  we  get  then  the  so-called  Lp -norm  of  a  vector  y 

imip  =  (£j/?)1/p- 

i 

Other  than  p  =  2,  the  second  most  important  case  is  p  =  1  (and  the  third  most 
important  case  corresponds  to  the  limit  as  p  tends  to  infinity). 

Measuring  the  size  of  e  in  (12.1)  using  the  L2-norm,  we  arrive  at  the  L 2- 
regression  problem,  which  is  to  find  x  that  attains  the  minimum  L2-norm  for  the 
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difference  between  b  and  Ax.  Of  course,  it  is  entirely  equivalent  to  minimize  the 
square  of  the  L2-norm,  and  so  we  get 


x  =  argminj|6  —  Ax 


2 

2* 


Just  as  for  the  mean,  there  is  an  explicit  formula  for  x.  To  find  it,  we  again  rely  on 
elementary  calculus.  Indeed,  let 


In  this  multidimensional  setting,  a  critical  point  is  defined  as  a  point  at  which  the 
derivative  with  respect  to  every  variable  vanishes.  So  if  we  denote  a  critical  point 
by  x ,  we  see  that  it  must  satisfy  the  following  equations: 

7^—  (x)  —  'y  ^  2  ^ bi  y  ^  CLij  Xj  ^  (  &ik)  —  0?  ^  —  1,2,.  ..,77/. 

i  3 


Simplifying  these  equations,  we  get 


y  ^  dik^i  —  y  ^  y  ^  a^aij  xj ,  k  — 1,2,...,  tx  . 

i  i  j 


In  matrix  notation,  these  equations  can  be  summarized  as  follows: 


ATb  =  AT  Ax. 


In  other  words,  assuming  that  AT  A  is  invertible,  we  get 
(12.2)  x  =  (ATA)~1ATb. 

This  is  the  formula  for  L2 -regression.  It  is  also  commonly  called  least  squares 
regression.  In  Section  12.6,  we  will  use  this  formula  to  solve  a  specific  regression 
problem. 

Example.  The  simplest  and  most  common  regression  model  arises  when  one 
wishes  to  describe  a  response  variable  b  as  a  linear  function  of  a  single  input  variable 
a.  In  this  case,  the  model  is 

b  =  ax  i  +  X2- 

The  unknowns  here  are  the  slope  x\  and  the  intercept  X2.  Figure  12.2  shows  a  plot 
of  three  pairs  (a,  b)  through  which  we  want  to  draw  the  “best”  straight  line.  At  first 
glance,  this  model  does  not  seem  to  fit  the  regression  paradigm,  since  regression 
models  (as  we’ve  defined  them)  do  not  involve  a  term  for  a  nonzero  intercept.  But 
the  model  here  can  be  made  to  fit  by  introducing  a  new  control  variable,  say,  a 2, 
which  is  always  set  to  1.  While  we’re  at  it,  let’s  change  our  notation  for  a  to  a\  so 
that  the  model  can  now  be  written  as 


b  =  a\X\  +  <22X2. 


The  three  data  points  can  then  be  summarized  in  matrix  notation  as 


1 

2.5 

3 


0  1 
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X\ 

x2 


ei 

^2 
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Figure  12.2.  Three  data  points  for  a  linear  regression. 
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4.  L1  -Regression 

Just  as  the  median  gives  a  more  robust  estimate  of  the  “average  value”  of  a 
collection  of  numbers  than  the  mean,  L1  -regression  is  less  sensitive  to  outliers  than 
least  squares  regression  is.  It  is  defined  by  minimizing  the  L1-norm  of  the  deviation 
vector  in  (12.1).  That  is,  the  problem  is  to  find  x  as  follows: 


x  =  argminj|&  —  Ax 


i- 


Unlike  for  least  squares  regression,  there  is  no  explicit  formula  for  the  solution  to 
the  L1  -regression  problem.  However,  the  problem  can  be  reformulated  as  a  linear 
programming  problem.  Indeed,  it  is  easy  to  see  that  the  L1  -regression  problem, 
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minimize 


E 


bi 


E 


(^ij  Xj 


3 


can  be  rewritten  as 


minimize  JE  t, 
subject  to  ti  — 


&ij%j 


=  0,  7  =  1,2,...,  777, 


which  is  equivalent  to  the  following  linear  programming  problem: 


(12.3) 


minimize  JE  ti 

subject  to  —ti  <  bi  —  JE  aijXj  <U,  i  =  1,  2, . . . ,  m. 


Hence,  to  solve  the  L1  -regression  problem,  it  suffices  to  solve  this  linear  program¬ 
ming  problem.  In  the  next  section,  we  shall  present  an  alternative  algorithm  for 
computing  the  solution  to  an  L1  -regression  problem. 

Example .  Returning  to  the  example  of  the  last  section,  the  L1  -regression  prob¬ 
lem  is  solved  by  finding  the  optimal  solution  to  the  following  linear  programming 
problem: 

minimize  t\  -\-t2  +£3 

subject  to  —  x2  — ti  <  —1 

— 2xi~  X2  — ti2  <  —2.5 

— 4x'i— .x‘2  —ts  <  —3 

x2~ti  <  1 

2xi+X2  —t2  <  2.5 

4xi+x2  -t3  <  3 

^1?  ^2 1  ^3  E  0. 

The  solution  to  this  linear  programming  problem  is 

„  r  0.5  1 


which  clearly  indicates  that  the  point  (2,  2.5)  is  viewed  by  the  L1  -regression  as  an 
outlier,  since  the  regression  line  passes  exactly  through  the  other  two  points. 


5.  Iteratively  Reweighted  Least  Squares 

Even  though  calculus  cannot  be  used  to  obtain  an  explicit  formula  for  the 
solution  to  the  L1  -regression  problem,  it  can  be  used  to  obtain  an  iterative  procedure 
that,  when  properly  initialized,  converges  to  the  solution  of  the  L1  -regression  prob¬ 
lem.  The  resulting  iterative  process  is  called  iteratively  reweighted  least  squares.  In 
this  section,  we  briefly  discuss  this  method.  We  start  by  considering  the  objective 
function  for  L1  -regression: 
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Ax 


1 


h 


E 


&ij  Xj 


J 


Differentiating  this  objective  function  is  a  problem,  since  it  involves  absolute  values. 
However,  the  absolute  value  function 


z 


is  differentiable  everywhere  except  at  one  point:  z  =  0.  Furthermore,  we  can  use 
the  following  simple  formula  for  the  derivative,  where  it  exists: 

Using  this  formula  to  differentiate  /  with  respect  to  each  variable,  and  setting  the 
derivatives  to  zero,  we  get  the  following  equations  for  critical  points: 


(12.4) 


df_ 

dxk 


E 


bi  XI  j  aij  xj 


bi  aijxj 


(  Q*ik)  0?  k  1,  2,  .  .  .  5  71. 


If  we  introduce  the  following  shorthand  notation  for  the  deviations, 


bi 


E 


&ij  Xj 


•> 


we  see  that  we  can  rewrite  (12.4)  as 


k  =  1,  2, . . . ,  n. 


Now,  if  we  let  Ex  denote  the  diagonal  matrix  containing  the  vector  e{x)  on  the 
diagonal,  we  can  write  these  equations  in  matrix  notation  as  follows: 

At  E~1b  =  At  El1  Ax. 

This  equation  can’t  be  solved  for  x  as  we  were  able  to  do  in  L2 -regression  because 
of  the  dependence  of  the  diagonal  matrix  on  x.  But  let  us  rearrange  this  system  of 
equations  by  multiplying  both  sides  by  the  inverse  of  AT E~x  A.  The  result  is 

X  =  (ATE~1A)~1  ATE~1b. 

This  formula  suggests  an  iterative  scheme  that  hopefully  converges  to  a  solution. 
Indeed,  we  start  by  initializing  x°  arbitrarily  and  then  use  the  above  formula  to 
successively  compute  new  approximations.  If  we  let  xk  denote  the  approximation 
at  the  kth  iteration,  then  the  update  formula  can  be  expressed  as 

Xk+l  =  (. AtE 

Assuming  only  that  the  matrix  inverse  exists  at  every  iteration,  one  can  show  that 
this  iteration  scheme  converges  to  a  solution  to  the  L1  -regression  problem. 


-l 

k 


X 


A)  1A 


T  E~jb. 
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6.  An  Example:  How  Fast  Is  the  Simplex  Method? 

In  Chapter  4,  we  discussed  the  worst-case  behavior  of  the  simplex  method  and 
studied  the  Klee-Minty  problem  that  achieves  the  worst  case.  We  also  discussed  the 
importance  of  empirical  studies  of  algorithm  performance.  In  this  section,  we  shall 
introduce  a  model  that  allows  us  to  summarize  the  results  of  these  empirical  studies. 

We  wish  to  relate  the  number  of  simplex  iterations  T  required  to  solve  a  lin¬ 
ear  programming  problem  to  the  number  of  constraints  m  and/or  the  number  of 
variables  n  in  the  problem  (or  some  combination  of  the  two).  As  any  statistician 
will  report,  the  first  step  is  to  introduce  an  appropriate  model.  Hence,  we  begin 
by  asking:  how  many  iterations,  on  average,  do  we  expect  the  simplex  method  to 
take  if  the  problem  has  m  constraints  and  n  variables?  To  propose  an  answer  to 
this  question,  consider  the  initial  dictionary  associated  with  a  given  problem.  This 
dictionary  involves  m  values,  Xg,  for  the  primal  basic  variables,  and  n  values,  yjf, 
for  the  dual  nonbasic  variables.  We  would  like  each  of  these  m  +  n  variables  to 
have  nonnegative  values,  since  that  would  indicate  optimality.  If  we  assume  that 
the  initial  dictionary  is  nondegenerate,  then  one  would  expect  on  the  average  that 
(ra  +  n)/2  of  the  values  would  be  positive  and  the  remaining  (ra  +  n)/2  values 
would  be  negative. 

Now  let’s  look  at  the  dynamics  of  the  simplex  method.  Each  iteration  focuses 
on  exactly  one  of  the  negative  values.  Suppose,  for  the  sake  of  discussion,  that  the 
negative  value  corresponds  to  a  dual  nonbasic  variable,  that  is,  one  of  the  coeffi¬ 
cients  in  the  objective  row  of  the  dictionary.  Then  the  simplex  method  selects  the 
corresponding  primal  nonbasic  variable  to  enter  the  basis,  and  a  leaving  variable  is 
chosen  by  a  ratio  test.  After  the  pivot,  the  variable  that  exited  now  appears  as  a 
nonbasic  variable  in  the  same  position  that  the  entering  variable  held  before.  Fur¬ 
thermore,  the  coefficient  on  this  variable  is  guaranteed  to  be  positive  (since  we’ve 
assumed  nondegeneracy).  Hence,  the  effect  of  one  pivot  of  the  simplex  method  is 
to  correct  the  sign  of  one  of  the  negative  values  from  the  list  of  ra  +  n  values  of 
interest.  Of  course,  the  pivot  also  affects  all  the  other  values,  but  there  seems  no 
reason  to  assume  that  the  situation  relative  to  them  will  have  any  tendency  to  get 
better  or  worse,  on  the  average.  Therefore,  we  can  think  of  the  simplex  method  as 
statistically  reducing  the  number  of  negative  values  by  one  at  each  iteration. 

Since  we  expect  on  the  average  that  an  initial  dictionary  will  have  (ra  +  n)/2 
negative  values,  it  follows  that  the  simplex  method  should  take  (ra  +  n)/2  itera¬ 
tions,  on  average.  Of  course,  these  expectations  are  predicated  on  the  assumption 
that  degenerate  dictionaries  don’t  arise.  As  we  saw  in  Section  7.2,  the  self-dual  sim¬ 
plex  method  initialized  with  random  perturbations  will,  with  probability  one,  never 
encounter  a  degenerate  dictionary.  Hence,  we  hypothesize  that  this  variant  of  the 
simplex  method  will,  on  average,  take  (ra  +  n)/2  iterations.  It  is  important  to  note 
the  main  point  of  our  hypothesis;  namely,  that  the  number  of  iterations  is  linear  in 
ra  +  n  as  opposed,  say,  to  quadratic  or  cubic. 


JIn  the  social  sciences,  a  fundamental  difficulty  is  the  lack  of  specific  arguments  validating  the 
appropriateness  of  the  models  commonly  introduced. 
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We  can  test  our  hypothesis  by  first  supposing  that  T  can  be  approximated  by  a 
function  of  the  form 

2  a(m  +  n)P 

for  a  pair  of  real  numbers  a  and  [3.  Our  goal  then  is  to  find  the  value  for  these 
parameters  that  best  fits  the  data  obtained  from  a  set  of  empirical  observations. 
(We’ve  written  the  leading  constant  as  2a  simply  for  symmetry  with  the  other 
factor — there  is  no  fundamental  need  to  do  this.)  This  multiplicative  representation 
of  the  number  of  iterations  can  be  converted  into  an  additive  (in  a  and  /3)  represen¬ 
tation  by  taking  logarithms.  Introducing  an  e  to  represent  the  difference  between  the 
model’s  prediction  and  the  true  number  of  iterations,  we  see  that  the  model  can  be 
written  as 

log  T  =  a  log  2  +  /3  log (m  +  n)  +  e. 

Now,  suppose  that  several  observations  are  made.  Using  subscripts  to  distinguish 
the  various  observations,  we  get  the  following  equations: 


logTi 

logT2 

"  log  2 
log  2 

log(mi  +  ni) 
log(ra2  +  n2) 

a 

£2 

loglfc 

_  log  2 

log(mfe  +  nk)  _ 

.  p  . 

+ 

_  ek 

If  we  let  b  denote  the  vector  on  the  left,  A  the  matrix  on  the  right,  x  the  vector 
multiplied  by  A,  and  e  the  vector  of  deviations,  then  the  model  can  be  expressed  as 

b  =  Ax  +  e, 

where  A  and  b  are  given.  As  we’ve  seen,  this  is  just  a  regression  model,  which  we 
can  solve  as  an  L1  -regression  or  as  an  L2 -regression. 

Given  real  data,  we  shall  solve  this  model  both  ways.  Table  12.1  shows  specific 
data  obtained  by  running  the  self-dual  simplex  method  described  in  Chapter  7  (with 
randomized  initial  perturbations)  against  most  of  the  problems  in  a  standard  suite 
of  test  problems  (called  the  netlib  suite  Gay  1985).  Some  problems  were  too  big 
to  run  on  the  workstation  used  for  this  experiment,  and  others  were  formulated  with 
free  variables  that  the  code  was  not  equipped  to  handle. 

Using  (12.2)  to  solve  the  problem  as  an  L2 -regression,  we  get 


a 

'  -1.03561  ' 

.  p  . 

1.05152 

Or,  in  other  words, 

T  ~  0.488(m  +  n)1,052. 

This  is  amazingly  close  to  our  hypothesized  formula,  (m  +  n)/2.  Figure  12.3  shows 
a  log-log  plot  of  T  vs.  m  +  n  with  the  L2 -regression  line  drawn  through  it.  It  is 
clear  from  this  graph  that  a  straight  line  (in  the  log-log  plot)  is  a  good  model  for 
fitting  this  data. 

Using  (12.3)  to  solving  the  problem,  we  get 


a 

'  -0.9508  ' 

.  p  . 

1.0491 

6.  AN  EXAMPLE:  HOW  FAST  IS  THE  SIMPLEX  METHOD? 
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Name 

m 

n 

iters 

Name 

m 

n 

iters 

25fv47 

111 

1,545 

5,089 

nesm 

646 

2,740 

5,829 

80bau3b 

2,021 

9,195 

10,514 

recipe 

74 

136 

80 

adlittle 

53 

96 

141 

sc  105 

104 

103 

92 

afiro 

25 

32 

16 

sc205 

203 

202 

191 

agg2 

481 

301 

204 

sc50a 

49 

48 

46 

agg3 

481 

301 

193 

sc50b 

48 

48 

53 

bandm 

224 

379 

1,139 

scagr25 

347 

499 

1,336 

beaconfd 

111 

111 

113 

scagr7 

95 

139 

339 

blend 

72 

83 

117 

scfxml 

282 

439 

531 

bnll 

564 

1,113 

2,580 

scfxm2 

564 

878 

1,197 

bnl2 

1,874 

3,134 

6,381 

scfxm3 

846 

1,317 

1,886 

boeingl 

298 

373 

619 

scorpion 

292 

331 

411 

boeing2 

125 

143 

168 

scrs8 

447 

1,131 

783 

bore3d 

138 

188 

227 

scsdl 

77 

760 

172 

brandy 

123 

205 

585 

scsd6 

147 

1,350 

494 

czprob 

689 

2,770 

2,635 

scsd8 

397 

2,750 

1,548 

d6cube 

403 

6,183 

5,883 

sctapl 

284 

480 

643 

degen2 

444 

534 

1,421 

sctap2 

1,033 

1,880 

1,037 

degen3 

1,503 

1,818 

6,398 

sctap3 

1,408 

2,480 

1,339 

e226 

162 

260 

598 

seba 

449 

896 

766 

etamacro 

334 

542 

1,580 

share  lb 

107 

217 

404 

fffff800 

476 

817 

1,029 

share2b 

93 

79 

189 

finnis 

398 

541 

680 

shell 

487 

1,476 

1,155 

fitld 

24 

1,026 

925 

ship041 

317 

1,915 

597 

fitlp 

627 

1,677 

15,284 

ship04s 

241 

1,291 

560 

forplan 

133 

415 

576 

ship081 

520 

3,149 

1,091 

ganges 

1,121 

1,493 

2,716 

ship08s 

326 

1,632 

897 

greenbea 

1,948 

4,131 

21,476 

ship  121 

687 

4,224 

1,654 

grow  15 

300 

645 

681 

ship  12s 

417 

1,996 

1,360 

grow22 

440 

946 

999 

sierra 

1,212 

2,016 

793 

grow7 

140 

301 

322 

standata 

301 

1,038 

74 

israel 

163 

142 

209 

standmps 

409 

1,038 

295 

kb2 

43 

41 

63 

stocforl 

98 

100 

81 

lotfi 

134 

300 

242 

stocfor2 

2,129 

2,015 

2,127 

maros 

680 

1,062 

2,998 

Table  12.1.  Number  of  iterations  for  the  self-dual  simplex  method. 


In  other  words, 

T  «  0.517(to  +  n)1  049. 

The  fact  that  this  regression  formula  agrees  closely  with  the  L2-regression  indicates 
that  the  data  set  contains  no  outliers.  In  Section  12.6.1,  we  will  consider  randomly 
generated  problems  and  see  at  least  one  example  where  the  L 1  and  L2  regression 
lines  differ  significantly. 

6.1.  Random  Problems.  Now,  let’s  consider  random  problems  generated  in  a 
manner  similar  to  the  way  we  did  it  back  in  Chapter  4.  We  do,  however,  introduce 
some  changes.  First  of  all,  the  problems  in  Chapter  4  were  generated  in  such  a 
manner  as  to  guarantee  primal  feasibility  but  dual  feasibility  was  left  to  chance — 
that  is,  many  (half)  of  the  problems  were  unbounded.  The  problems  we  wish  to 
consider  now  will  be  assumed  to  have  optimal  solutions  (real-world  problems  are 
often,  but  not  always,  known  to  have  an  optimal  solution  because  of  the  underlying 
physical  model  and  therefore  primal  or  dual  infeasibility  is  often  an  indicator  of 
data  and/or  modeling  errors).  To  guarantee  the  existence  of  an  optimal  solution,  we 
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Figure  12.3.  A  log-log  plot  of  T  vs.  m  +  n  and  the  L 1  and  L2 
regression  lines. 


generate  random  optimal  primal  and  dual  solutions  and  associated  random  optimal 
slack/surplus  variables.  Here’s  the  matlab  code  for  that: 

x  =  round ( sigma* rand (n, 1 )).* (rand (n, 1 )  >0 . 5 )  ; 
y  =  round ( sigma*rand ( 1 , m) ).* (rand ( 1 , m) >0 . 5 ) ; 
z  =  round ( sigma*rand ( 1 , n) ).* (rand ( 1 , n) >0 . 5 ) ; 
w  =  round ( sigma*rand (m, 1 )).* (rand (m, 1 ) >0 . 5 ) ; 

(as  in  Chapter  4,  sigma  is  a  constant  initialized  to  10).  We  then  define  A,  b ,  and  c 
as  problem  data  consistent  with  these  optimal  solutions: 

A  =  round ( sigma* ( randn (m, n) )).*( rand (m, n) >0 . 5 ) ; 
b  =  A*x  +  w; 
c  =  y*A  -  z; 

Note  that  we  have  made  one  other  key  change  from  before — we  have  randomly 
forced  about  half  of  the  optimal  values  of  the  variables  and  about  half  of  the  con¬ 
straint  matrix  coefficients  to  be  zero.  This  change  makes  the  problems  slightly  more 
realistic  as  real-world  problems  often  have  much  sparsity. 

Next,  we  need  to  initialize  a  right-hand  side  perturbation  and  an  objective  func¬ 
tion  perturbation: 

b0  =  rand (m, 1 ) ; 
cO  =  -rand ( 1 , n) ; 


6.  AN  EXAMPLE:  HOW  FAST  IS  THE  SIMPLEX  METHOD? 
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There  are  just  three  relatively  simple  changes  to  the  code  defining  the  simplex 
method  itself  to  convert  it  from  a  primal- simplex  algorithm  to  the  parametric  self¬ 
dual  method.  The  first  is  to  change  the  line  of  code  that  checks  if  the  problem  has 
been  solved.  Before,  we  only  needed  to  check  if  all  of  the  objective  coefficients  had 
become  negative  (dual  feasibility)  because  primal  feasibility  was  built  in.  Now,  we 
have  to  check  both  primal  and  dual  feasibility: 

while  max(c)  >  eps  | |  min(b)  <  -eps, 

Secondly,  the  choice  of  enter/leaving  variables  must  be  updated  as  it  is  now  based 
on  minimizing  the  perturbation  parameter: 

[mu_col ,  col]  =  max (  ( -c . /cO )  . *  (c0< -eps ) )  ; 

[mu_row,  row]  =  max (  ( -b . /bO ) . * (b0>  eps)); 

if  mu_col  >=  mu_row, 
mu  =  mu_col ; 

Acol  =  A ( : , col ) ; 

[t,  row]  =  max ( -Acol ./ (b+mu*bO )) ; 
else 

mu  =  mu_row ; 

Arow  =  A ( row, : ) ; 

[s,  col]  =  max ( -Arow ./ (c+mu*cO )) ; 

end 

Finally,  as  part  of  every  pivot  we  have  to  update  bO  and  cO: 

brow  =  bO (row) ; 

bO  =  bO  -  brow*Acol/a; 

bO (row)  =  -brow/a; 

ccol  =  cO  (col )  ; 

cO  =  cO  -  ccol*Arow/a; 

cO (col)  =  ccol/ a; 

The  code  was  run  1,000  times.  Figure  12.4  shows  the  number  of  pivots  plotted 
against  the  sum  m  +  n.  Just  as  we  saw  with  the  primal  simplex  method  in  Chapter  4, 
m  +  n  does  not  seem  to  be  a  good  measure  of  problem  size  as  many  problems  of  a 
given  size  solve  much  more  quickly  than  the  more  typical  cases.  Hence,  there  are 
a  number  of  “outliers.”  Overlay ed  on  the  scatter  plot  are  the  L 1  and  L 2  regression 
lines.  While  neither  regression  line  follows  what  appears  to  be  an  upper  line  of 
points  that  seems  to  dominate  the  results,  the  L1  is  closer  to  that  than  is  the  L2  line. 
The  result  of  the  L1  -regression  is: 

T  «  e— 0.722e1.121og (m+n)  =  0.486(to  +  n)1'12. 

The  result  of  the  L2 -regression  is: 

T  «  e— 0.606e1.051og(m+n)  =  Q ,546(m  +  n)l-05_ 

Finally,  as  in  Chapter  4,  min  (to,  n)  is  a  better  measure  of  problem  size  for  these 
randomly  generated  problems.  Figure  12.5  shows  the  same  data  plotted  against 

min  (m,  n). 
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Figure  12.4.  The  parametric  self-dual  simplex  method  was  used 
to  solve  1,000  problems  known  to  have  an  optimal  solution. 
Shown  here  is  a  log-log  plot  showing  the  number  of  pivots 
required  to  reach  optimality  plotted  against  m  +  n.  Also  shown 
are  the  L 1  and  L2  regression  lines. 


In  this  case,  both  regression  lines  are  about  the  same: 

T  «  e-0-2ei-46iog(mi„(m,n))  =  0.8  min(m,  n) 146 


Exercises 


12.1  Find  the  L2 -regression  line  for  the  data  shown  in  Figure  12.6. 

12.2  Find  the  L1  -regression  line  for  the  data  shown  in  Figure  12.6. 

12.3  Midrange.  Given  a  sorted  set  of  real  numbers,  {6i,  •  •  •  ?  &m}>  show  that 

the  midrange,  x  =  (bi  +  bm) /2,  minimizes  the  maximum  deviation  from 
the  set  of  observations.  That  is, 


7;(bi  +  bm)  =  argmin^R  max 

£4  i 


12.4  Centroid.  Given  a  set  of  points  {6i,  62, . . . ,  bm}  on  the  plane  M2,  show 
that  the  centroid 


1 


m 


number  of  pivots 
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Figure  12.5.  The  parametric  self-dual  simplex  method  was  used 
to  solve  1,000  problems  known  to  have  an  optimal  solution. 
Shown  here  is  a  log-log  plot  showing  the  number  of  pivots 
required  to  reach  optimality  plotted  against  min (m,n).  In  this 
case,  the  L 1  and  L 2  regression  lines  are  almost  exactly  on  top  of 
each  other. 


minimizes  the  sum  of  the  squares  of  the  distance  to  each  point  in  the  set. 
That  is,  x  solves  the  following  optimization  problem: 


m 

minimize  \\x  —  b{ 

7  =  1 


2 

2 


Note:  Each  data  point  bi  is  a  vector  in  M2  whose  components  are  denoted, 
say,  by  bn  and  b &,  and,  as  usual,  the  subscript  2  on  the  norm  denotes  the 
Euclidean  norm.  Hence, 


x 


-  bi\\ 2  =  \J (xi  -  bn)2  +  (x2  -  bi2)2. 


12.5  Facility  Location.  A  common  problem  is  to  determine  where  to  locate 
a  facility  so  that  the  distance  from  its  customers  is  minimized.  That  is, 
given  a  set  of  points  {6i,  62, . . . ,  bm}  on  the  plane  M2,  the  problem  is  to 
find  x  =  (xi,  X2)  that  solves  the  following  optimization  problem: 


m 


minimize 


Y.  Wx  -  bi 
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Figure  12.6.  Four  data  points  for  a  linear  regression. 


As  for  L1  -regression,  there  is  no  explicit  formula  for  x,  but  an  iterative 
scheme  can  be  derived  along  the  same  lines  as  in  Section  12.5.  Derive  an 
explicit  formula  for  this  iteration  scheme. 

12.6  A  Simple  Steiner  Tree .  Suppose  there  are  only  three  customers  in  the  fa¬ 
cility  location  problem  of  the  previous  exercise.  Suppose  that  the  triangle 
formed  by  b\,  62,  and  63  has  no  angles  greater  than  120°.  Show  that  the 
solution  x  to  the  facility  location  problem  is  the  unique  point  in  the  trian¬ 
gle  from  whose  perspective  the  three  customers  are  120°  apart  from  each 
other.  What  is  the  solution  if  one  of  the  angles,  say,  at  vertex  bi,  is  more 
than  120°? 

12.7  Sales  Force  Planning.  A  distributor  of  office  equipment  finds  that  the 
business  has  seasonal  peaks  and  valleys.  The  company  uses  two  types  of 
sales  persons:  (a)  regular  employees  who  are  employed  year-round  and 
cost  the  company  $17.50/h  (fully  loaded  for  benefits  and  taxes)  and  (b) 
temporary  employees  supplied  by  an  outside  agency  at  a  cost  of  $25/h. 
Projections  for  the  number  of  hours  of  labor  by  month  for  the  following 
year  are  shown  in  Table  12.2.  Let  ai  denote  the  number  of  hours  of  labor 
needed  for  month  i  and  let  x  denote  the  number  of  hours  per  month  of 
labor  that  will  be  handled  by  regular  employees.  To  minimize  total  labor 
costs,  one  needs  to  solve  the  following  optimization  problem: 

minimize  ^^(25max(a^  —  #,  0)  +  17.50#). 
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Jan 

390 

May 

310 

Sep 

550 

Feb 

420 

Jun 

590 

Oct 

360 

Mar 

340 

Jul 

340 

Nov 

420 

Apr 

320 

Aug 

580 

Dec 

600 

Table  12.2.  Projected  labor  hours  by  month. 


(a)  Show  how  to  reformulate  this  problem  as  a  linear  programming  prob¬ 
lem. 

(b)  Solve  the  problem  for  the  specific  data  given  above. 

(c)  Use  calculus  to  find  a  formula  giving  the  optimal  value  for  x. 

12.8  Acceleration  Due  to  Gravity.  The  law  of  gravity  from  classical  physics 
says  that  an  object  dropped  from  a  tall  building  will,  in  the  absence  of  air 
resistance,  have  a  constant  rate  of  acceleration  g  so  that  the  height  x,  as  a 
function  of  time  t,  is  given  by 

/x  1  2 

x{t)  =  -- gt. 

Unfortunately,  the  effects  of  air  resistance  cannot  be  ignored.  To  include 
them,  we  assume  that  the  object  experiences  a  retarding  force  that  is  di¬ 
rectly  proportional  to  its  speed.  Letting  v(t)  denote  the  velocity  of  the 
object  at  time  £,  the  equations  that  describe  the  motion  are  then  given  by 

x'(t)  =  v(t),  t  >  0,  x(0)  =  0, 

v'(t)  =  -g  -  fv(t),  t  >  0,  v(0)  =  0 

(/  is  the  unknown  constant  of  proportionality  from  the  air  resistance). 
These  equations  can  be  solved  explicitly  for  x  as  a  function  of  t: 

x (t)  =  -j2  ( e~ft  -  1  +  ft) 

v(t )  =  -j  (l-e~ft)  . 

It  is  clear  from  the  equation  for  the  velocity  that  the  terminal  velocity  is 
g/f.  It  would  be  nice  to  be  able  to  compute  g  by  measuring  this  velocity, 
but  this  is  not  possible,  since  the  terminal  velocity  involves  both  /  and  g. 
However,  we  can  use  the  formula  for  x(t)  to  get  a  two-parameter  model 
from  which  we  can  compute  both  /  and  g.  Indeed,  if  we  assume  that  all 
measurements  are  taken  after  terminal  velocity  has  been  “reached”  (i.e., 
when  e~ ^  is  much  smaller  than  1),  then  we  can  write  a  simple  linear 
expression  relating  position  to  time: 

_  9  9. 

X  f 2  / 

Now,  in  our  experiments  we  shall  set  values  of  x  (corresponding  to  spe¬ 
cific  positions  below  the  drop  point)  and  measure  the  time  at  which  the 
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Obs. 

number 

Position 

(m) 

Time 

(s) 

1 

-10 

3.72 

2 

-20 

7.06 

3 

-30 

10.46 

4 

-10 

3.71 

5 

-20 

7.00 

6 

-30 

10.48 

7 

-10 

3.67 

8 

-20 

7.08 

9 

-30 

10.33 

Table  12.3.  Time  at  which  a  falling  object  passes  certain  points. 


object  passes  these  positions.  Since  we  prefer  to  write  regression  mod¬ 
els  with  the  observed  variable  expressed  as  a  linear  function  of  the  control 
variables,  let  us  rearrange  the  above  expression  so  that  t  appears  as  a  func¬ 
tion  of  x : 

,  1  / 

t  =  - - x. 

f  9 

Using  this  regression  model  and  the  data  shown  in  Table  12.3,  do  an 
L2-regression  to  compute  estimates  for  1//  and  —f/g.  From  these  es¬ 
timates  derive  an  estimate  for  g.  If  you  have  access  to  linear  program¬ 
ming  software,  solve  the  problem  using  an  L1  -regression  and  compare 
your  answers. 


12.9  Iteratively  Reweighted  Least  Squares.  Show  that  the  sequence  of  iterates 
in  the  iteratively  reweighted  least  squares  algorithm  produces  a  monoton- 
ically  decreasing  sequence  of  objective  function  values  by  filling  in  the 
details  in  the  following  outline.  First,  recall  that  the  objective  function  for 
L1  -regression  is  given  by 


m 

2  =  1 


where 


n 


t^ij  %  j 


eiO)  =  h  - 

3  = 1 

Also,  the  function  that  defines  the  iterative  scheme  is  given  by 


T(x)  =  (AtE~1A)  xAtE-\ 


-1 


,T77- 1 


where  Ex  denotes  the  diagonal  matrix  with  the  vector  e(x)  on  its  diagonal. 
Our  aim  is  to  show  that 


f{T(x))  <  f{x). 


NOTES 
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In  order  to  prove  this  inequality,  let 

m  2  /  \ 

9x(z)  =  Y  77Z)  =  WEx1/2(b  ~  Az)\\2- 
%— 1 

(a)  Use  calculus  to  show  that,  for  each  x,  T(x)  is  a  global  minimum  of 

9x- 

(b)  Show  that  gx(x)  =  f(x). 

(c)  By  writing 

ti{T(x))  =  e;(x)  +  (ei(T(x))  -  e;(V)) 

and  then  substituting  the  right-hand  expression  into  the  definition  of 
gx(T(x)),  show  that 

gx(T(x))  >  2 f(T(xj)  -  f(x). 

(d)  Combine  the  three  steps  above  to  finish  the  proof. 


12.10  In  our  study  of  means  and  medians,  we  showed  that  the  median  of  a 
collection  of  numbers,  6i,  62, . . . ,  bn,  is  the  number  x  that  minimizes 
Y  \x  - b  j  | .  Let  11  be  a  real  parameter. 

(a)  Give  a  statistical  interpretation  to  the  following  optimization  problem: 


minimize 


£( 


x 


b: 


+  n(x  -  bj ))  . 


Hint:  the  special  cases  p  =  0,  ±1/2,  ±1  might  help  clarify  the  gen¬ 
eral  situation. 

(b)  Express  the  above  problem  as  a  linear  programming  problem. 

(c)  The  parametric  simplex  method  can  be  used  to  solve  families  of  lin¬ 
ear  programming  problems  indexed  by  a  parameter  p  (such  as  we 
have  here).  Starting  at  p  =  00  and  proceeding  to  p  =  —00  one 
solves  all  of  the  linear  programs  with  just  a  finite  number  of  pivots. 
Use  the  parametric  simplex  method  to  solve  the  problems  of  the  pre¬ 
vious  part  in  the  case  where  n  =  4  and  61  =  1,  62  =  2,  63  =  4,  and 
64  =  8. 

(d)  Now  consider  the  general  case.  Write  down  the  dictionary  that 
appears  in  the  k- th  iteration  and  show  by  induction  that  it  is  correct. 


12.11  Show  that  the  L°°-norm  is  just  the  maximum  of  the  absolute  values. 
That  is, 


lim 

p — ^00 


v 


max 


x 


% 


Notes 

Gonin  and  Money  (1989)  and  Dodge  (1987)  are  two  references  on  regression 
that  include  discussion  of  both  L2  and  L 1  regression.  The  standard  reference  on  L 1 
regression  is  Bloomfield  and  Steiger  (1983). 
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Several  researchers,  including  Smale  (1983),  Borgwardt  (1982,  1987a),  Adler 
and  Megiddo  (1985),  and  Todd  (1986),  have  studied  the  average  number  of  itera¬ 
tions  of  the  simplex  method  as  a  function  of  m  and/or  n.  The  model  discussed  in 
this  chapter  is  similar  to  the  sign-invariant  model  introduced  by  Adler  and  Berenguer 
(1981). 


CHAPTER  13 


Financial  Applications 


In  this  chapter,  we  shall  study  some  applications  of  linear  programming  to 
problems  in  quantitative  finance. 

1.  Portfolio  Selection 

Every  investor,  from  the  individual  to  the  professional  fund  manager,  must  de¬ 
cide  on  an  appropriate  mix  of  assets  to  include  in  his  or  her  investment  portfolio. 
Given  a  collection  of  potential  investments  (indexed,  say,  from  1  to  n),  let  Rj  denote 
the  return  in  the  next  time  period  on  investment  j,  j  =  1, . . . ,  n.  In  general,  Rj  is  a 
random  variable,  although  some  investments  may  be  essentially  deterministic. 

A  portfolio  is  determined  by  specifying  what  fraction  of  one’s  assets  to  put  into 
each  investment.  That  is,  a  portfolio  is  a  collection  of  nonnegative  numbers  Xj, 
j  =  1, . . . ,  n,  that  sum  to  one.  The  return  (on  each  dollar)  one  would  obtain  using  a 
given  portfolio  is  given  by 

R  =  ^  Xj  Rj . 

j 

The  reward  associated  with  such  a  portfolio  is  defined  as  the  expected  return1: 

E  R  =  XjKRj . 

j 

If  reward  were  the  only  issue,  then  the  problem  would  be  trivial:  simply  put  ev¬ 
erything  in  the  investment  with  the  highest  expected  return.  But  unfortunately,  in¬ 
vestments  with  high  reward  typically  also  carry  a  high  level  of  risk.  That  is,  even 
though  they  are  expected  to  do  very  well  in  the  long  run,  they  also  tend  to  be  erratic 
in  the  short  term.  There  are  many  ways  to  define  risk,  some  better  than  others.  We 
will  define  the  risk  associated  with  an  investment  (or,  for  that  matter,  a  portfolio  of 
investments)  to  be  the  mean  absolute  deviation  from  the  mean  (MAD)  : 


fin  this  chapter,  we  assume  a  modest  familiarity  with  the  ideas  and  notations  of  probability:  the 
symbol  E  denotes  expected  value ,  which  means  that,  if  R  is  a  random  variable  that  takes  values 
R(l),  R(2), .  . .  ,  R(T)  with  equal  probability,  then 


E  R 


1 

T 


t= l 
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E  R  —  E R  E 


E 


xj  (Rj  ^Rj) 


xjR j 


where  Rj  =  Rj  —  E Rj .  One  would  like  to  maximize  the  reward  while  at  the  same 
time  not  incur  excessive  risk.  Whenever  confronted  with  two  (or  more)  competing 
objectives,  it  is  necessary  to  consider  a  spectrum  of  possible  optimal  solutions  as 
one  moves  from  putting  most  weight  on  one  objective  to  the  other.  In  our  port¬ 
folio  selection  problem,  we  form  a  linear  combination  of  the  reward  and  the  risk 
(parametrized  here  by  fi)  and  maximize  that: 


yy x  3  ^3 

j 

subject  to  Xj  =  1 

j 

Xj  >  0  j  =  1,  2, . . . ,  n. 

Here,  /i  is  a  positive  parameter  that  represents  the  importance  of  risk  relative  to 
reward:  high  values  of  fi  tend  to  maximize  reward  regardless  of  risk,  whereas  low 
values  attempt  to  minimize  risk. 

It  is  important  to  note  that  by  diversifying  (that  is,  not  putting  everything  into 
one  investment),  it  might  be  possible  to  reduce  the  risk  without  reducing  the  reward. 
To  see  how  this  can  happen,  consider  a  hypothetical  situation  involving  two  invest¬ 
ments  A  and  B.  Each  year,  investment  A  either  goes  up  30  %  or  goes  down  10  %, 
but  unfortunately,  the  ups  and  downs  are  unpredictable  (that  is,  each  year  is  inde¬ 
pendent  of  the  previous  years  and  is  an  up  year  with  probability  1/2).  Investment 
B  is  also  highly  volatile.  In  fact,  in  any  year  in  which  A  goes  up  30  %,  investment 
B  goes  down  10  %,  and  in  the  years  in  which  A  goes  down  10  %,  B  goes  up  30  %. 
Clearly,  by  putting  half  of  our  portfolio  into  A  and  half  into  B,  we  can  create  a  port¬ 
folio  that  goes  up  10  %  every  year  without  fail.  The  act  of  identifying  investments 
that  are  negatively  correlated  with  each  other  (such  as  A  and  B)  and  dividing  the 
portfolio  among  these  investments  is  called  hedging.  Unfortunately,  it  is  fairly  diffi¬ 
cult  to  find  pairs  of  investments  with  strong  negative  correlations.  But  such  negative 
correlations  do  occur.  Generally  speaking,  they  can  be  expected  to  occur  when  the 
fortunes  of  both  A  and  B  depend  on  a  common  underlying  factor.  For  example,  a 
hot,  rainless  summer  is  good  for  energy  but  bad  for  agriculture. 

Solving  problem  (13.1)  requires  knowledge  of  the  joint  distribution  of  the  Rj9  s. 
However,  this  distribution  is  not  known  theoretically  but  instead  must  be  estimated 
by  looking  at  historical  data.  For  example,  Table  13.1  shows  monthly  returns  over  a 
recent  2-year  period  for  one  bond  fund  (3-year  Treasury  Bonds)  and  eight  different 
sector  index  funds:  Materials  (XFB),  Energy  (XFE),  Financial  (XFF),  Industrial 
(XFI),  Technology  (XFK),  Staples  (XFP),  Utilities  (XFU),  and  Healthcare  (XFV). 


maximize  /i  XjKRj 


E 
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Year- 

Month 

SHY 

Bonds 

XLB 

Materials 

XLE 

Energy 

XLF 

Financial 

XLI 

Indust. 

XLK 

Tech. 

XLP 

Staples 

XLU 

Util. 

XLV 

Health 

2007-04 

1.000 

1.044 

1.068 

1.016 

1.035 

1.032 

1.004 

0.987 

1.014 

2007-03 

1.003 

1.015 

1.051 

1.039 

1.046 

1.047 

1.028 

1.049 

1.073 

2007-02 

1.005 

1.024 

1.062 

0.994 

1.008 

1.010 

1.021 

1.036 

1.002 

2007-01 

1.007 

1.027 

0.980 

0.971 

0.989 

0.973 

0.985 

1.053 

0.977 

2006-12 

1.002 

1.040 

0.991 

1.009 

1.021 

1.020 

1.020 

0.996 

1.030 

2006-11 

1.001 

0.995 

0.969 

1.030 

0.997 

0.989 

1.020 

0.999 

1.007 

2006-10 

1.005 

1.044 

1.086 

1.007 

1.024 

1.028 

0.991 

1.026 

0.999 

2006-09 

1.004 

1.060 

1.043 

1.023 

1.028 

1.040 

1.018 

1.053 

1.003 

2006-08 

1.004 

1.000 

0.963 

1.040 

1.038 

1.040 

0.999 

0.985 

1.015 

2006-07 

1.008 

1.030 

0.949 

1.012 

1.011 

1.070 

1.039 

1.028 

1.029 

2006-06 

1.007 

0.963 

1.034 

1.023 

0.943 

0.974 

1.016 

1.048 

1.055 

2006-05 

1.002 

1.005 

1.022 

0.995 

0.999 

0.995 

1.018 

1.023 

1.000 

2006-04 

1.002 

0.960 

0.972 

0.962 

0.983 

0.935 

1.002 

1.016 

0.979 

2006-03 

1.002 

1.035 

1.050 

1.043 

1.021 

0.987 

1.010 

1.016 

0.969 

2006-02 

1.002 

1.047 

1.042 

1.003 

1.044 

1.023 

1.008 

0.954 

0.987 

2006-01 

1.000 

0.978 

0.908 

1.021 

1.031 

1.002 

1.008 

1.013 

1.012 

2005-12 

1.002 

1.048 

1.146 

1.009 

1.003 

1.034 

1.002 

1.024 

1.013 

2005-11 

1.004 

1.029 

1.018 

1.000 

1.005 

0.969 

1.001 

1.009 

1.035 

2005-10 

1.004 

1.076 

1.015 

1.048 

1.058 

1.063 

1.009 

0.999 

1.012 

2005-09 

0.999 

1.002 

0.909 

1.030 

0.986 

0.977 

0.996 

0.936 

0.969 

2005-08 

0.997 

1.008 

1.063 

1.009 

1.017 

1.002 

1.014 

1.042 

0.995 

2005-07 

1.007 

0.958 

1.064 

0.983 

0.976 

0.991 

0.983 

1.006 

0.996 

2005-06 

0.996 

1.056 

1.071 

1.016 

1.038 

1.057 

1.032 

1.023 

1.023 

2005-05 

1.002 

0.980 

1.070 

1.012 

0.974 

0.987 

0.981 

1.059 

0.994 

Table  13.1.  Monthly  returns  per  dollar  for  each  of  nine  invest¬ 
ments  over  2  years.  That  is,  $1  invested  in  the  energy  sector  fund 
XLE  on  April  1,  2007,  was  worth  $1,068  on  April  30,  2007. 


Let  Rj(t)  denote  the  return  on  investment  j  over  T  monthly  time  periods  as  shown 
in  Table  13.1.  One  way  to  estimate  the  mean  E Rj  is  simply  to  take  the  average  of 
the  historical  returns: 

1  T 

KRj  = — Rj  (t) . 
t= 1 


1.1.  Reduction  to  a  Linear  Programming  Problem.  As  formulated,  the  prob¬ 
lem  in  (13.1)  is  not  a  linear  programming  problem.  We  use  the  same  trick  we  used 
in  the  previous  chapter  to  replace  each  absolute  value  with  a  new  variable  and  then 
impose  inequality  constraints  that  ensure  that  the  new  variable  will  indeed  be  the 
appropriate  absolute  value  once  an  optimal  value  to  the  problem  has  been  obtained. 
But  first,  let  us  rewrite  (13.1)  with  the  expected  value  operation  replaced  by  a  simple 
averaging  over  the  given  historical  data: 


(13.2) 


maximize 
subject  to 


- 

3 

J2  x-i  = 1 

3 

xj  >  0 


3 


j  =  1,2, . . .  ,n, 
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where 


1 


T 


r 


j 


t= i 


denotes  the  expected  reward  for  asset  j.  Now,  replace 
new  variable  yt  and  rewrite  the  optimization  problem  as 

T 

\ — >  1  , 

maximize 


with  a 


vY^xiri  ~f^Vt 

j  t=  1 


(13.3) 


subject  to  -yt  <  ^2  Xj(Rj(t)  -  rj)  <  yt 

Ej  Xj  =  i 

Xj  >  0 

yt  >  o 


t  =  1,2, 


T 

1  ? 


J  =  1,2, . . 

t  =  1,2,.. 


n 


T 

5  * 


As  we’ve  seen  in  other  contexts  before,  at  optimality  one  of  the  two  inequalities 
involving  yt  must  actually  be  an  equality  because  if  both  inequalities  were  strict 
then  it  would  be  possible  to  further  increase  the  objective  function  by  reducing  yt. 


1.2.  Solution  via  Parametric  Simplex  Method.  The  problem  formulation 
given  by  (13.3)  is  a  linear  program  that  can  be  solved  for  any  particular  value  of 
y  using  the  methods  described  in  previous  chapters.  However,  we  can  do  much  bet¬ 
ter  than  this.  The  problem  is  a  parametric  linear  programming  problem,  where  the 
parameter  is  the  risk  aversion  parameter  y.  If  we  can  give  a  value  of  y  for  which  a 
basic  optimal  solution  is  obvious,  then  we  can  start  from  this  basic  solution  and  use 
the  parametric  simplex  method  to  find  the  optimal  solution  associated  with  each  and 
every  value  of  y.  It  is  easy  to  see  that  for  y  larger  than  some  threshold,  the  optimal 
solution  is  to  put  all  of  our  portfolio  into  a  single  asset,  the  one  with  the  highest 
expected  reward  rj.  Let  j*  denote  this  highest  reward  asset: 

Tj  *  >  rj  for  all  j. 


We  need  to  write  (13.3)  in  dictionary  form.  To  this  end,  let  us  introduce  slack 
variables  and  : 


1  t 

maximize  y  XjVj  —  —  yt 

j  t= l 

subject  to  —yt  —  Xj ( Rj (t)  —  rj)  -j-  =  0 

-yt  +  Ej  xj  (Rj  (t)  -rj)  +  wt  =  ° 

E  j  xj  =  i 

Xj  >  0 

yt,w^,w^  >  o 


t  =  l,2,...,T, 
t  =  l,2,...,T, 

j  =  1,2, . . .  ,n, 
t  =  l,2,...,T. 


We  have  3T  +  n  nonnegative  variables  and  2T  +  1  equality  constraints.  Hence,  we 
need  to  find  2T  +  1  basic  variables  and  T  +  n  —  1  nonbasic  variables.  Since  we 
know  the  optimal  values  for  each  of  the  allocation  variables,  Xj*  =1  and  the  rest  of 


1.  PORTFOLIO  SELECTION 


189 


the  Xj  ’s  vanish,  it  is  straightforward  to  figure  out  the  values  of  the  other  variables  as 
well.  We  can  then  simply  declare  any  variable  that  is  positive  to  be  basic  and  declare 
the  rest  to  be  nonbasic.  With  this  prescription,  the  variable  Xj*  must  be  basic.  The 
remaining  Xj’s  are  nonbasic.  Similarly,  all  of  the  yt’s  are  nonzero  and  hence  basic. 
For  each  t,  either  w or  wt  is  basic  and  the  other  is  nonbasic.  To  say  which  is 
which,  we  need  to  introduce  some  additional  notation.  Let 

Dtj  =  Rj  (t)  —  rj . 

Then  it  is  easy  to  check  that  w ^  is  basic  if  Dtj*  >  0  and  wt  is  basic  if  Dtj*  <  0 
(the  unlikely  case  where  Dtj*  =  0  can  be  decided  arbitrarily).  Let 

T+  =  {t  :  Dtj *  >  0}  and  T~  =  {t  :  Dtj*  <  0} 

and  let 

_  f  1,  for  t  G  T+ 

et  ~  \  -1,  for  t  G  T~ . 

It’s  tedious,  but  here’s  the  optimal  dictionary: 

T  T 

c  =  j j  L  - Dtj*)xj  wt 

t  =  1  j^j*  t=  1  teT-  t<ET+ 

+  fir r  +  fi  (rj  —  rj,  )xj 


y± 

—  Dtj * 

— 

^  "  (Dtj  Dtj*)xj 

3^3* 

+wt 

t  6  T~ 

Wt  = 

2  Dtj* 

+  2 

'y  v  (Dtj  —  Dtj*  )xj 

3^3* 

+W+ 

t  €  T+ 

yt  = 

Dtj* 

+ 

y  '  (Dtj  —  Dtj *  )xj 

3^3* 

+W+ 

t  €  T+ 

wt 

Xj *  = 

—2  Dtj* 

1 

-  2 

y  '  (Dtj  —  Dtj*  )xj 

3^3* 

£ 

3^3* 

+wt 

t  €  T~ 

We  can  now  check  that,  for  large  fi,  this  dictionary  is  optimal.  Indeed,  the  objective 
coefficients  on  the  w t  and  wf  variables  in  the  first  row  of  the  objective  function 
are  negative.  The  coefficients  on  the  x3  ’s  in  the  first  row  can  be  positive  or  negative 
but  for  fi  sufficiently  large,  the  negative  coefficients  on  the  xj’s  in  the  second  row 
dominate  and  make  all  coefficients  negative  after  considering  both  rows.  Similarly, 
the  fact  that  all  of  the  basic  variables  are  positive  follows  immediately  from  the 
definitions  of  T+  and  T~ . 

A  few  simple  inequalities  determine  the  //-threshold  above  which  the  given 
dictionary  is  optimal.  The  parametric  simplex  method  can  then  be  used  to  system¬ 
atically  reduce  //  to  zero.  Along  the  way,  each  dictionary  encountered  corresponds 
to  an  optimal  solution  for  some  range  of  fi  values.  Hence,  in  one  pass  we  have 
solved  the  portfolio  selection  problem  for  every  investor  from  the  bravest  to  the 
most  cautious.  Figure  13.1  shows  all  of  the  optimal  portfolios.  The  set  of  all  risk- 
reward  profiles  that  are  possible  is  shown  in  Figure  13.2.  The  lower-right  boundary 
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Figure  13.1.  Optimal  portfolios  as  a  function  of  risk  parameter  /i. 


of  this  set  is  the  so-called  efficient  frontier.  Any  portfolio  that  produces  a  risk- 
reward  combination  that  does  not  lie  on  the  efficient  frontier  can  be  improved  either 
by  increasing  its  mean  reward  without  changing  the  risk  or  by  decreasing  the  risk 
without  changing  the  mean  reward.  Hence,  one  should  only  invest  in  portfolios  that 
lie  on  the  efficient  frontier. 


2.  Option  Pricing 

Option  pricing  is  one  of  the  fundamental  problems  of  quantitative  finance.  In 
this  section  we  will  describe  briefly  what  an  option  is  and  formulate  upper  and  lower 
bounds  on  the  price  as  a  linear  programming  problem. 

An  option  is  a  derivative  security,  which  means  that  it  is  derived  from  a  simpler 
security  such  as  a  stock.  There  are  many  types  of  options,  some  quite  exotic.  For 
the  purposes  of  this  book,  I  will  only  describe  the  simplest  type  of  option,  the  call 
option.  A  call  option  is  a  contract  between  two  parties  in  which  one  party,  the  buyer, 
is  given  the  option  to  buy  from  the  other  party  a  particular  stock  at  a  particular  price 
at  a  particular  time  some  weeks  or  months  in  the  future.  For  example,  on  June  1st, 
2007,  Apple  Computer  stock  was  selling  for  $121  per  share.  On  this  date,  it  was 
possible  to  buy  an  option  allowing  one  to  purchase  Apple  stock  for  $130  (the  so- 
called  strike  price)  a  share  10  weeks  in  the  future  (the  expiration  date).  The  seller 
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Reward 

Figure  13.2.  The  efficient  frontier. 


was  offering  this  contract  for  a  price  of  $3.20.  Where  does  this  price  come  from? 
The  simple  answer  is  that  it  is  determined  solely  by  the  marketplace,  since  option 
contract  themselves  can  be  bought  and  sold  up  until  their  expiration  date.  But,  as 
technical  folks,  we  seek  an  analytical  formula  that  tells  us  what  a  fair  price  ought  to 
be.  This  we  can  do. 

To  explain  how  to  price  the  option,  we  need  to  think  a  little  bit  more  about  the 
value  of  the  option  on  the  date  of  expiration.  If  Apple  stock  does  well  over  the 
next  10  weeks  and  ends  up  at  $140  per  share,  then  on  the  day  of  expiration  I  can 
exercise  the  contract  and  buy  the  stock  for  $130.  I  can  then  immediately  sell  the 
stock  for  $140  and  pocket  the  $10  difference.  Of  course,  I  paid  $3.20  for  the  right 
to  do  this.  Hence,  my  net  profit  is  $6.80.  Now,  suppose  instead  of  rising  to  $140 
per  share,  the  stock  only  climbs  to  $132  per  share.  In  this  case,  I  will  still  want  to 
exercise  the  option  because  I  can  pocket  a  $2  difference.  But,  after  subtracting  the 
cost  of  the  option,  I’ve  actually  lost  a  modest  $1.20  per  share.  Finally,  suppose  that 
the  stock  only  goes  up  to  $125  per  share.  In  this  case,  I  will  let  the  option  expire 
without  exercising  it.  I  will  have  lost  only  the  $3.20  that  I  originally  paid  for  the 
option.  Finally,  consider  the  case  where  in  the  intervening  10  weeks  some  really 
bad  news  surfaces  that  drives  Apple  stock  down  to  $100  per  share.  Had  I  actually 
bought  Apple  stock,  I  would  now  be  out  $21  per  share,  which  could  be  a  substantial 
amount  of  money  if  I  had  bought  lots  of  shares.  But,  by  buying  the  option,  I’m  only 
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Figure  13.3.  A  graph  of  the  value  of  the  option  at  expiration  as 
a  function  of  stock  price.  In  this  example,  the  strike  price  is  $130. 


out  $3.20  per  share.  This  is  the  attraction  of  call  options.  They  allow  an  investor 
who  is  optimistic  about  the  economy  (or  a  particular  company)  to  take  a  chance 
without  risking  much  on  the  down  side.  Figure  13.3  shows  a  plot  of  the  net  profit 
per  share  as  a  function  of  the  share  price  at  expiration. 

Let  so  denote  the  (known)  current  stock  price  and  let  Si  denote  the  (not  yet 
known,  i.e.,  random)  stock  price  at  expiration.  A  key  feature  of  options  is  that  their 
value  at  the  expiration  date  is  given  by  a  specific  function  ft  (Si)  of  the  stock  price 
at  expiration.  For  the  specific  call  option  discussed  above,  the  function  ft  (Si)  is 
the  “hockey-stick”  shaped  function  shown  in  Figure  13.3.  If  we  think  we  know  the 
distribution  of  the  random  variable  Si,  then  we  could  compute  its  expected  value 
and,  if  we  ignore  the  discounting  for  inflation,  we  could  use  this  to  price  the  option: 

p  =  Eft(Si). 

Unfortunately,  we  generally  don’t  know  the  distribution  of  Si. 

We  can,  however,  make  some  indirect  inferences  based  on  “market  wisdom” 
that  constrain  the  possible  values  for  p  and  thereby  implicitly  tell  us  something 
about  the  distribution  of  Si.  Specifically,  let  us  imagine  that  there  are  already  a 
number  of  options  being  traded  in  the  market  that  are  based  on  the  same  underlying 
stock  and  have  the  same  expiration  date.  Let  us  suppose  that  there  are  already  n 
options  being  traded  in  the  market  with  known  prices.  That  is,  there  are  specific 
functions  fty(Si),  j  =  1,  2, . . . ,  n,  for  which  there  are  already  known  prices  pj. 
One  can  think  of  the  underlying  stock  itself  as  the  simplest  possible  option.  Since 
the  stock  is  traded,  it  too  provides  some  information  about  the  future.  We  assume 
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that  this  trivial  asset  is  the  j  —  1  option  in  the  collection  of  known  priced  options. 
For  this  option,  we  have  hi  (Si)  =  Si  and  pi  =  sq.  To  these  n  options,  we  add  one 
more:  cash.  One  dollar  today  will  be  worth  one  dollar  on  the  expiration  date  (again, 
we  are  ignoring  here  the  time  value  of  money).  In  a  sense,  this  is  also  and  option. 

The  problem  we  wish  to  consider  is  how  to  price  a  new  option  whose  payout 
function  we  denote  by  g (Si).  Consider  building  a  portfolio  of  the  available  options 
consisting  of  xq  “shares”  of  dollars,  xi  shares  of  the  underlying  stock,  and  Xj  shares 
of  option  j  (j  =  2, . . . ,  n).  Today,  this  portfolio  costs 

n 

X0  +  Xi  So  +  E  XjPj  . 

3  =  2 

At  the  expiration  date,  the  portolio’s  value  will  be 

n 

Xq  +  XiSi  +  E  Xjhj(Si)- 

3=2 

Suppose  that  no  matter  how  Si  turns  out,  the  value  of  the  new  option  dominates  that 
of  the  portfolio: 

n 

Xq  +  XiSi  +  E  Xjhj(Si)  <  g(Si). 

3=2 

Then,  it  must  be  the  case  that  the  price  p  of  the  new  option  today  must  also  dominate 
the  cost  of  this  portfolio: 

n 

Xq  +  +  E  XjPj  <  p. 

3=2 

This  is  called  a  no-arbitrage  condition.  This  no-arbitrage  condition  implies  a  lower 
bound  p  on  the  price  of  the  new  option,  which  we  can  maximize: 

n 

maximize  xo  +  +  E  XjPj 

j=2 

n 

subject  to  xo  +  XiSi  +  E  Xjhj(Si)  <  g(Si). 

j= 2 

This  problem  actually  has  an  infinite  number  of  constraints  because  the  inequality 
must  hold  no  matter  what  value  Si  takes  on.  It  can  be  made  into  a  linear  program¬ 
ming  problem  by  introducing  a  finite  set  of  possible  values,  say  si(l),  si(2), . . . , 
s i  (m).  The  resulting  linear  programming  problem  can  thus  be  written  as 

n 

maximize  p  =  xo  +  xiSo  +  E  XjPj 

(13.4)  t2 

subject  to  x0  +  XiSi  (i)  +  E  XjhjOxii))  <  g(si(i)),  i  =  l, . . .  ,m. 

j= 2 

In  a  completely  analogous  manner  we  can  find  a  tight  upper  bound  p  for  p  by  solving 
a  minimization  problem: 
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n 

minimize  p  =  xq  +  xiSq  +  E  XfPj 

(i3.5)  r2 

subject  to  x0  +  xi5i(z)  +  E  Xjhjis^i))  >  g(si(i)),  i  =  1, . . . ,  m. 

i=2 

The  dual  problem  associated  with  (13.4)  is 
minimize  E 

i 

subject  to  E  Vi  = 

i 

ESl(*)yi  =  s0, 

i 

^2hj(s1(i))yi  =  pj,  j  =  2, ...  ,n 

i 

Ui>  0,  i  =  1, . . . ,  m. 

Note  that  the  first  and  last  constraints  tell  us  that  the  yP  s  are  a  system  of  probabili¬ 
ties.  Given  this  interpretation  of  the  yP s  as  probabilities,  the  expression 

EsiWyi 

i 

is  just  an  expected  value  of  the  random  variable  Si  computed  using  these  probabil¬ 
ities.  So,  the  constraint  JE  s\(i)yi  =  so  means  that  the  expected  stock  price  at  the 
end  of  the  time  period  must  match  the  current  stock  price,  when  computed  with  the 
yi  probabilities.  For  this  reason,  we  call  these  probabilities  risk  neutral.  Similarly, 
the  constraints  JE  hj(s\(i))yi  =  Pj,  j  =  2 , . . . ,  n,  tell  us  that  each  of  the  options 
must  also  be  priced  in  such  a  way  that  the  expected  future  price  matches  the  current 
market  price. 


Exercises 

13.1  Find  every  portfolio  on  the  efficient  frontier  using  the  most  recent  6  months 
of  data  for  the  Bond  (SHY),  Materials  (XLB),  Energy  (XLE),  and  Finacial 
(XLF)  sectors  as  shown  in  Table  13.1  (that  is,  using  the  upper  left  6x4 
subblock  of  data). 

13.2  On  Planet  Claire,  markets  are  highly  volatile.  Here’s  some  recent  histori¬ 
cal  data: 


Year- 

Month 

Hair 

Products 

Cosmetics 

Cash 

2007-04 

1.0 

2.0 

1.0 

2007-03 

2.0 

2.0 

1.0 

2007-02 

2.0 

0.5 

1.0 

2007-01 

0.5 

2.0 

1.0 

Find  every  portfolio  on  Planet  Claire’s  efficient  frontier. 
13.3  What  is  the  dual  of  (13.5)? 


NOTES 
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Notes 

The  portfolio  selection  problem  originates  with  Markowitz  (1959).  He  won  the 
1990  Nobel  prize  in  Economics  for  this  work.  In  its  original  formulation,  risk  is 
modeled  by  the  variance  of  the  portfolio’s  value  rather  than  the  absolute  deviation 
from  the  mean  considered  here.  We  will  discuss  the  quadratic  formulation  later  in 
Chapter  24. 

The  MAD  risk  measure  we  have  considered  in  this  chapter  has  many  nice  prop¬ 
erties  the  most  important  of  which  is  that  it  produces  portfolios  that  are  guaranteed 
not  to  be  stochastically  dominated  (to  second  order)  by  other  portfolios.  Many  risk 
measures  fail  to  possess  this  important  property.  See  Ruszczynski  and  Vanderbei 
(2003)  for  details. 


Part  2 

Network-Type  Problems 


Allow  me  to  say ,  . . . ,  that  the  arguments  with 
which  you  have  supported  this  extraordinary 
application  have  been  as  frivolous  as  the 
application  was  ill-judged.  —  J.  Austen 


CHAPTER  14 


Network  Flow  Problems 


Many  linear  programming  problems  can  be  viewed  as  a  problem  of  minimizing 
the  “transportation”  cost  of  moving  materials  through  a  network  to  meet  demands 
for  material  at  various  locations  given  sources  of  material  at  other  locations.  Such 
problems  are  called  network  flow  problems.  They  form  the  most  important  special 
class  of  linear  programming  problems.  Transportation,  electric,  and  communication 
networks  provide  obvious  examples  of  application  areas.  Less  obvious,  but  just 
as  important,  are  applications  in  facilities  location,  resource  management,  financial 
planning,  and  others. 

In  this  chapter  we  shall  formulate  a  special  type  of  linear  programming  problem 
called  the  minimum-cost  network  flow  problem.  It  turns  out  that  the  simplex  method 
when  applied  to  this  problem  has  a  very  simple  description  and  some  important  spe¬ 
cial  properties.  Implementations  that  exploit  these  properties  benefit  dramatically. 

1.  Networks 

A  network  consists  of  two  types  of  objects:  nodes  and  arcs.  We  shall  let  A f 
denote  the  set  of  nodes.  We  let  m  denote  the  number  of  nodes  (i.e.,  the  cardinality 
of  the  set  Af). 

The  nodes  are  connected  by  arcs.  Arcs  are  assumed  to  be  directed.  This  means 
that  an  arc  connecting  node  i  to  node  j  is  not  the  same  as  an  arc  connecting  node  j 
to  node  i.  For  this  reason,  we  denote  arcs  using  the  standard  mathematical  notation 
for  ordered  pairs.  That  is,  the  arc  connecting  node  i  to  node  j  is  denoted  simply  as 
(i,  j).  We  let  A  denote  the  set  of  all  arcs  in  the  network.  This  set  is  a  subset  of  the 
set  of  all  possible  arcs: 

A  C  {(i,j)  :  i,  j  €  Af,i  ^  j}. 

In  typical  networks,  the  set  A  is  much  smaller  than  the  set  of  all  arcs.  In  fact,  usually 
each  node  is  only  connected  to  a  handful  of  “nearby”  nodes. 

The  pair  (A /,  A)  is  called  a  network.  It  is  also  sometimes  called  a  graph  or 
a  digraph  (to  emphasize  the  fact  that  the  arcs  are  directed).  Figure  14.1  shows  a 
network  having  7  nodes  and  14  arcs. 

To  specify  a  network  flow  problem,  we  need  to  indicate  the  supply  of  (or  de¬ 
mand  for)  material  at  each  node.  So,  for  each  i  E  A f,  let  bi  denote  the  amount 
of  material  being  supplied  to  the  network  at  node  i.  We  shall  use  the  convention 


The  original  version  of  this  chapter  was  revised.  An  erratum  to  this  chapter  can  be  found  at  DOI 
10.1007/978-l-4614-7630-6_26 

R .J.  Vanderbei,  Linear  Programming ,  International  Series  in  Operations  Research  199 

&  Management  Science  196,  DOI  10.1007/978-l-4614-7630-6_14, 
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Figure  14.1.  A  network  having  7  nodes  and  14  arcs.  The  num¬ 
bers  written  next  to  the  nodes  denote  the  supply  at  the  node  ( neg¬ 
ative  values  indicate  demands;  missing  values  indicate  no  supply 
or  demand). 

that  negative  supplies  are  in  fact  demands.  Hence,  our  problem  will  be  to  move  the 
material  that  sits  at  the  supply  nodes  over  to  the  demand  nodes.  The  movements 
must  be  along  the  arcs  of  the  network  (and  adhering  to  the  directions  of  the  arcs). 
Since,  except  for  the  supply  and  demand,  there  is  no  other  way  for  material  to  enter 
or  leave  the  system,  it  follows  that  the  total  supply  must  equal  the  total  demand  for 
the  problem  to  have  a  feasible  solution.  Hence,  we  shall  always  assume  that 

E =  o- 

ieAf 

To  help  us  decide  the  paths  along  which  materials  should  move,  we  assume 
that  each  arc,  say,  (i,  j),  has  associated  with  it  a  cost  cij  that  represents  the  cost  of 
shipping  one  unit  from  i  to  j  directly  along  arc  The  decision  variables  then 

are  how  much  material  to  ship  along  each  arc.  That  is,  for  each  (i,j)  G  A,  Xij  will 
denote  the  quantity  shipped  directly  from  i  to  j  along  arc  (i,  j).  The  objective  is  to 
minimize  the  total  cost  of  moving  the  supply  to  meet  the  demand: 

minimize  E  Cij  Xij . 

(hj)eA 

As  we  mentioned  before,  the  constraints  on  the  decision  variables  are  that  they 
ensure  flow  balance  at  each  node.  Let  us  consider  a  fixed  node,  say,  k  G  M.  The 
total  flow  into  node  k  is  given  by 

^  ^  X%k  • 
i : 

(i,k)eA 
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Figure  14.2.  The  costs  on  the  arcs  for  the  network  in  Figure  14.1. 


Similarly,  the  total  flow  out  from  node  k  is 

xkj  • 

3- 

(kj)eA 

The  difference  between  these  two  quantities  is  the  net  inflow,  which  must  be  equal 
to  the  demand  at  the  node.  Hence,  the  flow  balance  constraints  can  be  written  as 

^  ^  %ik  ^  ^  Xkj  —  k  E  J\f . 

i :  j: 

(i,k)eA  ( k,j)eA 


Finally,  the  flow  on  each  arc  must  be  nonnegative  (otherwise  it  would  be  going  in 
the  wrong  direction): 

Xij  >  0,  (i,j)eA. 

Figure  14.2  shows  cost  information  for  the  network  shown  in  Figure  14.1.  In 
matrix  notation,  the  problem  can  be  written  as  follows: 

minimize  cTx 

(14.1)  subject  to  Ax  =  —b 

x  >  0, 


where 


X 


T 


A 


%3lC  ^ad  *£ae  ^ba  ^bc  ^be  ^db  ^de  *^fa  *£fb  *£fc  *Efg  *£gb  ^ge 

-1  -1-11  1 
-1-1-11  1  1 
1  1  1 
1  -1  -1 
111  1 

-1  -1  -1  -1 
1  -1  -1 

48  28  10  7  65  7  38  15  56  48  108  24  33  19 


0" 

0 

-6 

-6 

-2 

9 

5 


5 
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In  network  flow  problems,  the  constraint  matrix  A  is  called  the  node-arc  inci¬ 
dence  matrix. 

The  network  flow  problem  differs  from  our  usual  standard  form  linear  pro¬ 
gramming  problem  in  two  respects:  (1)  it  is  a  minimization  instead  of  a  maximiza¬ 
tion  and  (2)  the  constraints  are  equalities  instead  of  inequalities.  Nonetheless,  we 
have  studied  before  how  duality  applies  to  problems  in  nonstandard  form.  The  dual 
of  (14.1)  is 

maximize  -bTy 

rri 

subject  to  A  y  +  z=c 

z>  0. 

Written  in  network  notation,  the  dual  is 

maximize  —  E  hyi 

ieAf 

subject  to  yj  -  y{  +  %  =  c^-,  (i,j)  e  A 

Zij>  0,  (»,  j)  €  A 

Finally,  it  is  not  hard  to  check  that  the  complementarity  conditions  (to  be  satisfied 
by  an  optimal  primal-dual  solution  pair)  are 

xijzij  =  0,  (■ i,j)eA . 

We  shall  often  refer  to  the  primal  variables  as  primal  flows. 


2.  Spanning  Trees  and  Bases 

Network  flow  problems  can  be  solved  efficiently  because  the  basis  matrices 
have  a  special  structure  that  can  be  described  nicely  in  terms  of  the  network.  In 
order  to  explain  this  structure,  we  need  to  introduce  a  number  of  definitions. 

First  of  all,  an  ordered  list  of  nodes  (n\ ,  ri2, . . . ,  n^)  is  called  a  path  in  the  net¬ 
work  if  each  adjacent  pair  of  nodes  in  the  list  is  connected  by  an  arc  in  the  network. 
It  is  important  to  note  that  we  do  not  assume  that  the  arcs  point  in  any  particular 
direction.  For  example,  for  nodes  rii  and  ni+ 1,  there  must  be  an  arc  in  the  network. 
It  could  run  either  from  rii  to  n*+i  or  from  r^+i  to  rii.  (One  should  think  about 
one-way  roads — even  though  cars  can  only  go  one  way,  pedestrians  are  allowed  to 
walk  along  the  path  of  the  road  in  either  direction.)  A  network  is  called  connected  if 
there  is  a  path  connecting  every  pair  of  nodes  (see  Figure  14.3).  For  the  remainder 
of  this  chapter,  we  make  the  following  assumption: 

Assumption.  The  network  is  connected. 

For  any  arc  (i,  j),  we  refer  to  i  as  its  tail  and  j  as  its  head. 

A  cycle  is  a  path  in  which  the  last  node  coincides  with  the  first  node.  A  network 
is  called  acyclic  if  it  does  not  contain  any  cycles  (see  Figure  14.4). 

A  network  is  a  tree  if  it  is  connected  and  acyclic  (see  Figure  14.5).  A  network 
(Af,  A)  is  called  a  subnetwork  of  (A/*,  A)  if  AT  C  AT  and  A  C  A.  A  subnetwork 
(Af,  A)  is  a  spanning  tree  if  it  is  a  tree  and  Af  =  AT.  Since  a  spanning  tree’s  node 
set  coincides  with  the  node  set  of  the  underlying  network,  it  suffices  to  refer  to  a 
spanning  tree  by  simply  giving  its  arc  set. 
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Figure  14.3.  The  network  on 
one  on  the  right  is  not. 


Figure  14.4.  The  network  on 
the  one  on  the  right  is  acyclic. 


the  left  is  connected  whereas  the 


the  left  contains  a  cycle  whereas 


Figure  14.5.  The  network  on  the  left  is  a  tree  whereas  the  two 
on  the  right  not — they  fail  in  the  first  case  by  being  disconnected 
and  in  the  second  by  containing  a  cycle. 


Given  a  network  flow  problem,  any  selection  of  primal  flow  values  that  satisfies 
the  balance  equations  at  every  node  will  be  called  a  balanced  flow .  It  is  important  to 
note  that  we  do  not  require  the  flows  to  be  nonnegative  to  be  a  balanced  flow.  That 
is,  we  allow  flows  to  go  in  the  wrong  direction.  If  all  the  flows  are  nonnegative, 
then  a  balanced  flow  is  called  a  feasible  flow.  Given  a  spanning  tree,  a  balanced 
flow  that  assigns  zero  flow  to  every  arc  not  on  the  spanning  tree  will  be  called  a 
tree  solution.  Consider,  for  example,  the  tree  shown  in  Figure  14.6.  The  num¬ 
bers  shown  on  the  arcs  of  the  spanning  tree  give  the  tree  solution  corresponding  to 
the  supplies/demands  shown  in  Figure  14.1.  They  were  obtained  by  starting  at  the 
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Figure  14.6.  The  fat  arcs  show  a  spanning  tree  for  the  network 
in  Figure  14.1.  The  numbers  shown  on  the  arcs  of  the  spanning 
tree  are  the  primal  flows,  the  numbers  shown  next  to  the  nodes 
are  the  dual  variables,  and  the  numbers  shown  on  the  arcs  not 
belonging  to  the  spanning  tree  are  the  dual  slacks. 


“leaves”  of  the  tree  and  working  “inward.”  For  instance,  the  flows  could  be  solved 
for  successively  as  follows: 


flow  bal  at  d: 

^ad 

=  6, 

flow  bal  at  a: 

Xfa 

^ad 

=  0 

Xfa 

flow  bal  at  f: 

^fa 

=  -9 

Xfb 

flow  bal  at  c: 

%bc 

=  6, 

flow  bal  at  b: 

%fb  H~  ^gb 

%bc 

-  0 

=>* 

Xgb 

flow  bal  at  e: 

Xge 

=  2. 

It  is  easy  to  see  that  this  process  always  works.  The  reason  is  that  every  tree  must 
have  at  least  one  leaf  node,  and  deleting  a  leaf  node  together  with  the  edge  leading 
into  it  produces  a  subtree. 

The  above  computation  suggests  that  spanning  trees  are  related  to  bases  in  the 
simplex  method.  Let  us  pursue  this  idea.  Normally,  a  basis  is  an  invertible  square 
submatrix  of  the  constraint  matrix.  But  for  incidence  matrices,  no  such  submatrix 
exists.  To  see  why,  note  that  if  we  sum  together  all  the  rows  of  A,  we  get  a  row 
vector  of  all  zeros  (since  each  column  of  A  has  exactly  one  +1  and  one  —1).  Of 
course,  every  square  submatrix  of  A  has  this  same  property  and  so  is  singular.  In 
fact,  we  shall  show  in  a  moment  that  for  a  connected  network,  there  is  exactly  one 
redundant  equation  (i.e.,  the  rank  of  A  is  exactly  m  —  1). 
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Let  us  select  some  node,  say,  the  last  one,  and  delete  the  flow-balance  constraint 
associated  with  this  node  from  the  constraints  defining  the  problem  (since  it  is  re¬ 
dundant  anyway).  Let’s  call  this  node  the  root  node.  Let  A  denote  the  incidence 
matrix  A  without  the  row  corresponding  to  the  root  node  (i.e.,  the  last  row),  and  let 
b  denote  the  supply  /demand  vector  with  the  last  entry  deleted.  The  most  important 
property  of  network  flow  problems  is  summarized  in  the  following  theorem: 

THEOREM  14.1.  A  square  submatrix  of  A  is  a  basis  if  and  only  if  the  arcs  to 
which  its  columns  correspond  form  a  spanning  tree. 

Rather  than  presenting  a  formal  proof  of  this  theorem,  it  is  more  instructive  to 
explain  the  idea  using  the  example  we’ve  been  studying.  Therefore,  consider  the 
spanning  tree  shown  in  Figure  14.6,  and  let  B  denote  the  square  submatrix  of  A 
corresponding  to  this  tree.  The  matrix  B  is  invertible  if  and  only  if  every  system  of 
equations  of  the  form 

Bu  =  p 

has  a  unique  solution.  This  is  exactly  the  type  of  equation  that  we  already  solved  to 
find  the  tree  solution  associated  with  the  spanning  tree: 

Bxjs  =  —b. 


We  solved  this  system  of  equations  by  looking  at  the  spanning  tree  and  realizing  that 
we  could  work  our  way  to  a  solution  by  starting  with  the  leaves  and  working  inward. 
This  process  amounts  to  a  permutation  of  the  rows  and  columns  of  B  to  get  a  lower 
triangular  matrix.  Indeed,  for  the  calculations  given  above,  we  have  permuted  the 
rows  by  P  and  the  columns  by  Q  to  get 


d 

a 

pbqt  =  f 

C 

b 

e 


(a,d)  (f,a)  (f,b)  (b,c)  (g,b)  (g,e) 
1 

-1  1 
-1  -1 

1 

1  -1  1 


The  fact  that  B  is  invertible  is  now  immediately  apparent  from  the  fact  that  the  per¬ 
muted  matrix  is  lower  triangular.  In  fact,  it  has  only  +l’s  and  —  l’s  on  the  diagonal. 
Therefore,  we  can  solve  systems  of  equations  involving  B  without  ever  having  to 
do  any  divisions.  Also,  since  the  off-diagonal  entries  are  also  =bl’s,  it  follows  that 
we  don’t  need  to  do  any  multiplications  either.  Every  system  of  equations  involving 
the  matrix  B  can  be  solved  by  a  simple  sequence  of  additions  and  subtractions. 

We  have  shown  that,  given  a  spanning  tree,  the  submatrix  of  A  consisting  of 
the  columns  corresponding  to  the  arcs  in  the  spanning  tree  is  a  basis.  The  converse 
direction  (which  is  less  important  to  our  analysis)  is  relegated  to  an  exercise  (see 
Exercise  14.12). 

Not  only  is  there  a  primal  solution  associated  with  any  basis  but  also  there  is  a 
dual  solution.  Hence,  corresponding  to  any  spanning  tree  there  is  a  dual  solution. 


206 


14.  NETWORK  FLOW  PROBLEMS 


The  dual  solution  consists  of  two  types  of  variables:  the  yi  s  and  the  s.  These 
variables  must  satisfy  the  dual  feasibility  conditions: 

Uj  Vi  H-  %ij  Cij ,  (4,  j)  £  ‘-/4- 

By  complementarity,  ^  =  0  for  each  (i,j)  in  the  spanning  tree  T.  Hence, 

yj  -  Vi  =  Cij,  (■ i,j)eT . 

Since  a  spanning  tree  on  m  nodes  has  m  —  1  arcs  (why?),  these  equations  define 
a  system  of  m  —  1  equations  in  m  unknowns.  But  don’t  forget  that  there  was  a 
redundant  equation  in  the  primal  problem,  which  we  associated  with  a  specific  node 
called  the  root  node.  Removing  that  equation  and  then  looking  at  the  dual,  we  see 
that  there  is  not  really  a  dual  variable  associated  with  the  root  node.  Or  equiva¬ 
lently,  we  can  just  say  that  the  dual  variable  for  the  root  node  is  zero.  Making  this 
assignment,  we  get  m  equations  in  m  unknowns.  These  equations  can  be  solved  by 
starting  at  the  root  node  and  working  down  the  tree. 

For  example,  let  node  “g”  be  the  root  node  in  the  spanning  tree  in  Figure  14.6. 
Starting  with  it,  we  compute  the  dual  variables  as  follows: 


2/g  =  0, 

across  arc  (g,e): 

2/e 

-  ys  =  19 

Vt  =  19, 

across  arc  (g,b): 

Vb 

2/g  33 

Vb  =  33, 

across  arc  (b,c): 

Vc 

-  yb  =  65 

2/c  =  98, 

across  arc  (f,b): 

Vb 

-  yf  =  48 

Vt  =  —15, 

across  arc  (f,a): 

2/a 

-  yf  =  56 

Ua.  =  41, 

across  arc  (a,d): 

2/d 

-  2/a  -  28 

2/d  =  69. 

Now  that  we  know  the  dual  variables,  the  dual  slacks  for  the  arcs  not  in  the  spanning 

tree  T  can  be  computed  using 

Zij  =  Vi  +  Cij  -  yj,  ( i,j )  £  T 

(which  is  just  the  dual  feasibility  condition  solved  for  z^).  These  values  are  shown 
on  the  nontree  arcs  in  Figure  14.6. 

From  duality  theory,  we  know  that  the  current  tree  solution  is  optimal  if  all  the 
flows  are  nonnegative  and  if  all  the  dual  slacks  are  nonnegative.  The  tree  solution 
shown  in  Figure  14.6  satisfies  the  first  condition  but  not  the  second.  That  is,  it 
is  primal  feasible  but  not  dual  feasible.  Hence,  we  can  apply  the  primal  simplex 
method  to  move  from  this  solution  to  an  optimal  one.  We  take  up  this  task  in  the 
next  section. 


3.  The  Primal  Network  Simplex  Method 

Each  of  the  variants  of  the  simplex  method  presented  in  earlier  chapters  of  this 
book  can  be  applied  to  network  flow  problems.  It  would  be  overkill  to  describe 
them  all  here  in  the  context  of  networks.  However,  they  are  all  built  on  two  simple 
algorithms:  the  primal  simplex  method  (for  problems  that  are  primal  feasible)  and 
the  dual  simplex  method  (for  problems  that  are  dual  feasible).  We  discuss  them  both 
in  detail. 
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Figure  14.7.  The  cycle  produced  by  including  the  entering  arc 
with  the  spanning  tree.  As  the  flow  t  on  the  entering  arc  increases, 
eventually  the  flow  on  arc  (f,b)  becomes  zero  (when  t  —  3). 
Hence,  arc  (f,b)  is  the  leaving  arc. 


We  shall  describe  the  primal  network  simplex  method  by  continuing  with  our 
example.  As  mentioned  above,  the  tree  shown  in  Figure  14.6  is  primal  feasible  but 
not  dual  feasible.  The  basic  idea  that  defines  the  primal  simplex  method  is  to  pick 
a  nontree  arc  that  is  dual  infeasible  and  let  it  enter  the  tree  (i.e.,  become  basic)  and 
then  readjust  everything  so  that  we  still  have  a  tree  solution. 

The  First  Iteration .  For  our  first  pivot,  we  let  arc  (a,c)  enter  the  tree  using  a 
primal  pivot.  In  a  primal  pivot,  we  add  flow  to  the  entering  variable,  keeping  all 
other  nontree  flows  set  to  zero  and  adjusting  the  tree  flows  appropriately  to  maintain 
flow  balance.  Given  any  spanning  tree,  adding  an  extra  arc  must  create  a  cycle 
(why?).  Hence,  the  current  spanning  tree  together  with  the  entering  arc  must  contain 
a  cycle.  The  flows  on  the  cycle  must  change  to  accommodate  the  increasing  flow 
on  the  entering  arc.  The  flows  on  the  other  tree  arcs  remain  unchanged.  In  our 
example,  the  cycle  is:  “a”,  “c”,  “b”,  “f”.  This  cycle  is  shown  in  Figure  14.7  with 
flows  adjusted  to  take  into  account  a  flow  of  t  on  the  entering  arc.  As  t  increases, 
eventually  the  flow  on  arc  (f,b)  decreases  to  zero.  Hence,  arc  (f,b)  is  the  leaving  arc. 
Updating  the  flows  is  easy;  just  take  t  —  3  and  adjust  the  flows  appropriately. 

With  a  little  thought,  one  realizes  that  the  selection  rule  for  the  leaving  arc  in  a 
primal  pivot  is  as  follows: 


Leaving  arc  selection  rule: 

•  The  leaving  arc  must  be  oriented  along  the  cycle  in  the  re¬ 
verse  direction  from  the  entering  arc,  and 

•  Among  all  such  arcs,  it  must  have  the  smallest  flow. 


Also,  the  flows  on  the  cycle  get  updated  as  follows: 


Primal  flows  update: 

•  Flows  oriented  in  the  same  direction  as  the  leaving  arc  are 
decreased  by  the  amount  of  flow  that  was  on  the  leaving 
arc  whereas  flows  in  the  opposite  direction  are  increased 
by  this  amount. 
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Figure  14.8.  The  two  disjoint  trees.  Primal  and  dual  values  that 
remained  unchanged  are  shown,  whereas  those  that  need  to  be 
updated  are  shown  as  question  marks. 


The  next  issue  is  how  to  update  the  dual  variables.  To  this  end,  note  that  if 
we  delete  the  leaving  arc  from  the  spanning  tree  (without  concurrently  adding  the 
entering  arc),  we  disconnect  it  into  two  disjoint  trees.  In  our  example,  one  tree 
contains  nodes  “a”,  “d”  and  “f”  while  the  second  tree  contains  the  other  nodes. 
Figure  14.8  shows  the  two  disjoint  trees.  Recalling  that  the  dual  variables  are  calcu¬ 
lated  starting  with  the  root  node  and  working  up  the  spanning  tree,  it  is  clear  that  the 
dual  variables  on  the  subtree  containing  the  root  node  remain  unchanged,  whereas 
those  on  the  other  subtree  must  change.  For  the  current  pivot,  the  other  subtree 
consists  of  nodes  “a”,  “d”,  and  “f”.  They  all  get  incremented  by  the  same  fixed 
amount,  since  the  only  change  is  that  the  arc  by  which  we  bridged  from  the  root- 
containing  tree  to  this  other  tree  has  changed  from  the  leaving  arc  to  the  entering 
arc.  Looking  at  node  “a”  and  using  tildes  to  denote  values  after  being  changed,  we 
see  that 


V  a  —  Vc  cac 

=  Vc  cac, 


whereas 


^ac  —  Vd  Cac 

Combining  these  two  equations,  we  get 


2/c- 


IJd  —  IJd  ^dC  • 

That  is,  the  dual  variable  at  node  “a”  gets  decremented  by  zac  =  —9.  Of  course, 
all  of  the  dual  variables  on  this  subtree  get  decremented  by  this  same  amount.  In 
general,  the  dual  variable  update  rule  can  be  stated  as  follows: 
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Figure  14.9.  The  tree  solution  at  the  end  of  the  first  iteration. 


Dual  variables  update: 

•  If  the  entering  arc  crosses  from  the  root-containing  tree  to 
the  non-root-containing  tree,  then  increase  all  dual  vari¬ 
ables  on  the  non-root-containing  tree  by  the  dual  slack  of 
the  entering  arc. 

•  Otherwise,  decrease  these  dual  variables  by  this  amount. 

Finally,  we  must  update  the  dual  slacks.  The  only  dual  slacks  that  change  are 
those  that  span  across  the  two  trees  since,  for  these  nodes,  either  the  head  or  the  tail 
dual  variable  changes,  while  the  other  does  not.  Those  that  span  the  two  subtrees 
in  the  same  direction  as  the  entering  arc  must  be  decreased  by  zac,  whereas  those 
that  bridge  the  two  trees  in  the  opposite  direction  must  be  increased  by  this  amount. 
For  our  example,  six  nontree  arcs,  (fig),  (fib),  (fie),  (d,b),  (d,e),  and  (a,e),  span  in 
the  same  direction  as  the  entering  arc.  They  all  must  be  decreased  by  —9.  That  is, 
they  must  be  increased  by  9.  For  example,  the  dual  slack  on  arc  (fie)  changes  from 
—5  to  4.  Only  one  arc,  (b,a),  spans  in  the  other  direction.  It  must  be  decreased  by 
9.  The  updated  solution  is  shown  in  Figure  14.9.  The  general  rule  for  updating  the 
dual  slacks  is  as  follows: 

Dual  slacks  update: 

•  The  dual  slacks  corresponding  to  those  arcs  that  bridge  in 
the  same  direction  as  the  entering  arc  get  decremented  by 
the  old  dual  slack  on  the  entering  arc,  whereas  those  that 
correspond  to  arcs  bridging  in  the  opposite  direction  get 
incremented  by  this  amount. 

The  Second  Iteration.  The  tree  solution  shown  in  Figure  14.9  has  only  one 
remaining  infeasibility:  Zba  =  —10.  Arc  (b,a)  must  therefore  enter  the  spanning 
tree.  Adding  it,  we  create  a  cycle  consisting  of  nodes  “a”,  “b”,  and  “c”.  The  leaving 


210 


14.  NETWORK  FLOW  PROBLEMS 


Figure  14.10.  The  two  disjoint  subtrees  arising  in  the  second  iteration. 


Figure  14.11.  The  tree  solution  at  the  end  of  the  second  itera¬ 
tion.  To  get  from  the  spanning  tree  in  Figure  14.9  to  here,  we  let 
arc  (b,a)  enter  and  arc  (b,c)  leave. 

arc  must  be  pointing  in  the  opposite  direction  from  the  entering  arc.  Here,  there  is 
only  one  such  arc,  (b,c).  It  must  be  the  leaving  arc.  The  leaving  arc’s  flow  decreases 
from  3  to  0.  The  flow  on  the  other  two  cycle  arcs  must  increase  by  3  to  preserve 
flow  balance. 

The  two  subtrees  formed  by  removing  the  leaving  arc  are  shown  in  Figure  14.10 
The  dual  variables  on  the  non-root-containing  subtree  get  incremented  by  the  dual 
slack  on  the  entering  arc  z ba  =  —10.  The  dual  slacks  for  the  spanning  arcs  also 
change  by  10  either  up  or  down  depending  on  which  way  they  bridge  the  two  sub¬ 
trees.  The  resulting  tree  solution  is  shown  in  Figure  14.11. 
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Figure  14.12.  The  tree  solution  at  the  end  of  the  third  iteration. 
To  get  from  the  spanning  tree  in  Figure  14.11  to  here,  we  let  arc 
(f,b)  enter  and  arc  (f,a)  leave.  This  tree  solution  is  the  optimal 
solution  to  the  problem. 


The  Third  and  Final  Iteration.  The  tree  solution  shown  in  Figure  14.11  has 
one  infeasibility:  2%  =  —1.  Hence,  arc  (f,b)  must  enter  the  spanning  tree.  The 
leaving  arc  must  be  (f,a).  Leaving  the  details  of  updating  to  the  reader,  the  resulting 
tree  solution  is  shown  in  Figure  14.12.  It  is  both  primal  and  dual  feasible — hence 
optimal. 


4.  The  Dual  Network  Simplex  Method 

In  the  previous  section,  we  developed  simple  rules  for  the  primal  network  sim¬ 
plex  method,  which  is  used  in  situations  where  the  tree  solution  is  primal  feasible 
but  not  dual  feasible.  When  a  tree  solution  is  dual  feasible  but  not  primal  feasible, 
then  the  dual  network  simplex  method  can  be  used.  We  shall  define  this  method 
now.  Consider  the  tree  solution  shown  in  Figure  14.13.  It  is  dual  feasible  but  not 
primal  feasible  (since  x db  <  0).  The  basic  idea  that  defines  the  dual  simplex  method 
is  to  pick  a  tree  arc  that  is  primal  infeasible  and  let  it  leave  the  spanning  tree  (i.e., 
become  nonbasic)  and  then  readjust  everything  to  preserve  dual  feasibility. 

The  First  Iteration.  For  the  first  iteration,  we  need  to  let  arc  (d,b)  leave  the 
spanning  tree  using  a  dual  pivot ,  which  is  defined  as  follows.  Removing  arc  (d,b) 
disconnects  the  spanning  tree  into  two  disjoint  subtrees.  The  entering  arc  must  be 
one  of  the  arcs  that  spans  across  the  two  subtrees  so  that  it  can  reconnect  them  into 
a  spanning  tree.  That  is,  it  must  be  one  of 

(a,e),  (a,d),  (b,e),  or  (g,e). 

See  Figure  14.14.  To  see  how  to  decide  which  it  must  be,  we  need  to  consider 
carefully  the  impact  of  each  possible  choice. 
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Figure  14.13.  A  tree  solution  that  is  dual  feasible  but  not  primal  feasible. 


Figure  14.14.  The  two  subtrees  for  the  first  pivot  of  the  dual 
simplex  method. 


To  this  end,  let  us  consider  the  general  situation.  As  mentioned  above,  the  span¬ 
ning  tree  with  the  leaving  arc  removed  consists  of  two  disjoint  trees.  The  entering 
arc  must  reconnect  these  two  trees. 

First,  consider  a  reconnecting  arc  that  connects  in  the  same  direction  as  the 
leaving  arc.  When  we  add  flow  to  this  prospective  entering  arc,  we  will  have  to 
decrease  flow  on  the  leaving  arc  to  maintain  flow  balance.  Therefore,  the  leaving 
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Figure  14.15.  The  tree  solution  after  the  first  pivot. 


arc’s  flow,  which  is  currently  negative,  can’t  be  raised  to  zero.  That  is,  the  leaving 
arc  can’t  leave.  This  is  no  good. 

Now  suppose  that  the  reconnecting  arc  connects  in  the  opposite  direction.  If 
it  were  to  be  the  entering  arc,  then  its  dual  slack  would  drop  to  zero.  All  other 
reconnecting  arcs  pointing  in  the  same  direction  would  drop  by  the  same  amount. 
To  maintain  nonnegativity  of  all  the  others,  we  must  pick  the  one  that  drops  the 
least.  We  can  summarize  the  rule  as  follows: 

Entering  arc  selection  rule: 

•  The  entering  arc  must  bridge  the  two  subtrees  in  the  oppo¬ 
site  direction  from  the  leaving  arc,  and 

•  Among  all  such  arcs,  it  must  have  the  smallest  dual  slack. 

In  our  example,  all  bridging  arcs  point  in  the  opposite  direction  from  the  leaving 
arc.  The  one  with  the  smallest  dual  slack  is  (g,e)  whose  slack  is  zge  =  9.  This  arc 
must  be  the  entering  arc. 

We  have  now  determined  both  the  entering  and  leaving  arcs.  Hence,  the  new 
spanning  tree  is  determined  and  therefore,  in  principle,  all  the  variables  associated 
with  this  new  spanning  tree  can  be  computed.  Furthermore,  the  rules  for  determin¬ 
ing  the  new  values  by  updating  from  the  previous  ones  are  the  same  as  in  the  primal 
network  simplex  method.  The  resulting  tree  solution  is  shown  in  Figure  14.15. 

The  Second  Iteration.  For  the  second  pivot,  there  are  two  choices  for  the  leaving 
arc:  (g,b)  and  (d,e).  Using  the  most  infeasible,  we  choose  (d,e).  We  remove  this  arc 
from  the  spanning  tree  to  produce  two  subtrees.  One  of  the  subtrees  consists  of  just 
the  node  “d”  all  by  itself  while  the  other  subtree  consists  of  the  rest  of  the  nodes. 
Remembering  that  the  reconnecting  arc  must  bridge  the  two  subtrees  in  the  opposite 
direction,  the  only  choice  is  (a,d).  So  this  arc  is  the  entering  arc.  Making  the  pivot, 
we  arrive  at  the  optimal  tree  solution  shown  in  Figure  14.12. 
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5.  Putting  It  All  Together 

As  we  saw  in  Chap.  5,  for  linear  programming  the  primal  and  the  dual  simplex 
methods  form  the  foundation  on  which  one  can  build  a  few  different  variants  of  the 
simplex  method.  The  same  is  true  here  in  the  context  of  network  flows. 

For  example,  one  can  build  a  two-phased  procedure  in  which  one  first  uses 
the  dual  network  simplex  method  (with  costs  artificially  and  temporarily  altered  to 
ensure  dual  feasibility  of  an  initial  tree  solution)  to  find  a  primal  feasible  solution 
and  then  uses  the  primal  network  simplex  method  to  move  from  the  feasible  solution 
to  an  optimal  one. 

Alternatively,  one  can  use  the  primal  network  simplex  method  (with  supplies 
temporarily  altered  to  ensure  primal  feasibility  of  an  initial  tree  solution)  to  find 
a  dual  feasible  solution  and  then  use  the  dual  network  simplex  method  (with  the 
original  supplies)  to  move  from  the  dual  feasible  solution  to  an  optimal  one. 

Finally,  as  described  for  linear  programming  in  Chap.  7,  one  can  define  a  para¬ 
metric  self-dual  method  in  which  primal  pivots  and  dual  pivots  are  intermingled  as 
needed  so  as  to  reduce  a  perturbation  parameter  p  from  oo  to  zero. 

Since  there  is  nothing  new  in  how  one  builds  the  network  versions  of  these 
algorithms  from  the  basic  primal  and  dual  simplex  pivots,  we  don’t  go  through 
any  examples  here.  Instead,  we  just  mention  one  final  observation  about  the  dual 
variables,  the  yf  s.  Namely,  they  are  not  needed  anywhere  in  the  performance  of 
a  primal  or  a  dual  pivot.  Hence,  their  calculation  is  entirely  optional  and  can  be 
skipped  altogether  or  simply  deferred  to  the  end. 

For  completeness,  we  end  this  section  by  giving  a  step-by-step  description  of 
the  self -dual  network  simplex  method.  The  steps  are  as  follows: 

(1)  Identify  a  spanning  tree — any  one  will  do  (see  Exercise  14.14).  Also  iden¬ 
tify  a  root  node. 

(2)  Compute  initial  primal  flows  on  the  tree  arcs  by  assuming  that  nontree 
arcs  have  zero  flow  and  the  total  flow  at  each  node  must  be  balanced.  For 
this  calculation,  the  computed  primal  flows  may  be  negative.  In  this  case, 
the  initial  primal  solution  is  not  feasible.  The  calculation  is  performed 
working  from  leaf  nodes  inward. 

(3)  Compute  initial  dual  values  by  working  out  from  the  root  node  along  tree 
arcs  using  the  formula 

Uj  Vi  Qj  •> 

which  is  valid  on  tree  arcs,  since  the  dual  slacks  vanish  on  these  arcs. 

(4)  Compute  initial  dual  slacks  on  each  nontree  arc  using  the  formula 

%ij  —  Vi  “b  Cj  j  yj . 

Again,  some  of  the  Zif  s  might  be  nonnegative.  This  is  okay  (for  now), 
but  it  is  important  that  they  satisfy  the  above  equality. 

(5)  Perturb  each  primal  flow  and  each  dual  slack  that  has  a  negative  initial 
value  by  adding  a  positive  scalar  fi  to  each  such  value. 

(6)  Identify  a  range  pMlN  <  p  <  Pmax  over  which  the  current  solution  is 
optimal  (on  the  first  iteration,  /iMAX  will  be  infinite). 
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(7)  Check  the  stopping  rule:  if  /xMIN  <  0,  then  set  p  =  0  to  recover  an  optimal 
solution.  While  not  optimal,  perform  each  of  the  remaining  steps  and  then 
return  to  recheck  this  condition. 

(8)  Select  an  arc  associated  with  the  inequality  pMm  <  fi  (if  there  are  several, 
pick  one  arbitrarily).  If  this  arc  is  a  nontree  arc,  then  the  current  pivot  is  a 
primal  pivot.  If,  on  the  other  hand,  it  is  a  tree  arc,  then  the  pivot  is  a  dual 
pivot. 

(a)  If  the  pivot  is  a  primal  pivot,  the  arc  identified  above  is  the  entering 
arc.  Identify  the  associated  leaving  arc  as  follows.  First,  add  the 
entering  arc  to  the  tree.  With  this  arc  added,  there  must  be  a  cycle 
consisting  of  the  entering  arc  and  other  tree  arcs.  The  leaving  arc  is 
chosen  from  those  arcs  on  the  cycle  that  go  in  the  opposite  direction 
from  the  entering  arc  and  having  the  smallest  flow  among  all  such 
arcs  (evaluated  at  p,  —  /iMIN). 

(b)  If  the  pivot  is  a  dual  pivot,  the  arc  identified  above  is  the  leaving  arc. 
Identify  the  associated  entering  arc  as  follows.  First,  delete  the  leav¬ 
ing  arc  from  the  tree.  This  deletion  splits  the  tree  into  two  subtrees. 
The  entering  arc  must  bridge  these  two  trees  in  the  opposite  direction 
to  the  leaving  arc,  and,  among  such  arcs,  it  must  be  the  one  with  the 
smallest  dual  slack  (evaluated  at  p,  =  /iMIN). 

(9)  Update  primal  flows  as  follows.  Add  the  entering  arc  to  the  tree.  This  ad¬ 
dition  creates  a  cycle  containing  both  the  entering  and  leaving  arcs.  Adjust 
the  flow  on  the  leaving  arc  to  zero,  and  then  adjust  the  flows  on  each  of 
the  other  cycle  arcs  as  necessary  to  maintain  flow  balance. 

(10)  Update  dual  variables  as  follows.  Delete  the  leaving  arc  from  the  old 
tree.  This  deletion  splits  the  old  tree  into  two  subtrees.  Let  Tu  denote 
the  subtree  containing  the  tail  of  the  entering  arc,  and  let  Tv  denote  the 
subtree  containing  its  head.  The  dual  variables  for  nodes  in  Tu  remain 
unchanged,  but  the  dual  variables  for  nodes  in  Tv  get  incremented  by  the 
old  dual  slack  on  the  entering  arc. 

(11)  Update  dual  slacks  as  follows.  All  dual  slacks  remain  unchanged  except 
for  those  associated  with  nontree  arcs  that  bridge  the  two  subtrees  Tu 
and  Tv.  The  dual  slacks  corresponding  to  those  arcs  that  bridge  in  the 
same  direction  as  the  entering  arc  get  decremented  by  the  old  dual  slack 
on  the  entering  arc,  whereas  those  that  correspond  to  arcs  bridging  in  the 
opposite  direction  get  incremented  by  this  amount. 

As  was  said  before  and  should  now  be  clear,  there  is  no  need  to  update  the  dual 
variables  from  one  iteration  to  the  next;  that  is,  step  10  can  be  skipped. 


6.  The  Integrality  Theorem 

In  this  section,  we  consider  network  flow  problems  for  which  all  the  supplies 
and  demands  are  integers.  Such  problems  are  called  network  flow  problems  with  in¬ 
teger  data.  As  we  explained  in  Sect.  14.2,  for  network  flow  problems,  basic  primal 
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solutions  are  computed  without  any  multiplication  or  division.  The  following  im¬ 
portant  theorem  follows  immediately  from  this  property: 

THEOREM  14.2.  Integrality  Theorem.  For  network  flow  problems  with  integer 
data,  every  basic  feasible  solution  and,  in  particular,  every  basic  optimal  solution 
assigns  integer  flow  to  every  arc. 

This  theorem  is  important  because  many  real-world  network  flow  problems 
have  integral  supplies/demands  and  require  their  solutions  to  be  integral  too.  This 
integrality  restriction  typically  occurs  when  one  is  shipping  indivisible  units  through 
a  network.  For  example,  it  wouldn’t  make  sense  to  ship  one  third  of  a  car  from  an 
automobile  assembly  plant  to  one  dealership  with  the  other  two  thirds  going  to  an¬ 
other  dealership. 

Problems  that  are  linear  programming  problems  with  the  additional  stipulation 
that  the  optimal  solution  values  must  be  integers  are  called  integer  programming 
problems.  Generally  speaking,  these  problems  are  much  harder  to  solve  than  linear 
programming  problems  (see  Chap.  23).  However,  if  the  problem  is  a  network  flow 
problem  with  integer  data,  it  can  be  solved  efficiently  using  the  simplex  method 
to  compute  a  basic  optimal  solution,  which  the  integrality  theorem  tells  us  will  be 
integer  valued. 

6.1.  Konig’s  Theorem.  In  addition  to  its  importance  in  real-world  optimiza¬ 
tion  problems,  the  integrality  theorem  also  has  many  applications  to  the  branch  of 
mathematics  called  combinatorics.  We  illustrate  with  just  one  example. 

THEOREM  14.3.  Konig’s  Theorem.  Suppose  that  there  are  n  girls  and  n  boys, 
that  every  girl  knows  exactly  k  boys,  and  that  every  boy  knows  exactly  k  girls.  Then 
n  marriages  can  be  arranged  with  everybody  knowing  his  or  her  spouse. 

Before  proving  this  theorem  it  is  important  to  clarify  its  statement  by  saying 
that  the  property  of  “knowing”  is  symmetric  (for  example,  knowing  in  the  biblical 
sense).  That  is,  if  a  certain  girl  knows  a  certain  boy,  then  this  boy  also  knows  this 
girl. 

Proof.  Consider  a  network  with  nodes  gi,  #2,  •  •  • ,  gn,  62, . . . ,  bn  and  an 
arc  from  gi  to  bj  if  girl  i  and  boy  j  know  each  other.  Assign  one  unit  of  supply  to 
each  girl  node  and  a  unit  of  demand  to  each  boy  node.  Assign  arbitrary  objective 
coefficients  to  create  a  well-defined  network  flow  problem.  The  problem  is  guaran¬ 
teed  to  be  feasible:  just  put  a  flow  of  1/k  on  each  arc  (the  polygamists  in  the  group 
might  prefer  this  nonintegral  solution).  By  the  integrality  theorem,  the  problem  has 
an  integer- valued  solution.  Clearly,  the  flow  on  each  arc  must  be  either  zero  or  one. 
Also,  each  girl  node  is  the  tail  of  exactly  one  arc  having  a  flow  of  one.  This  arc 
points  to  her  intended  mate.  □ 


Exercises 

In  solving  the  following  problems,  the  network  pivot  tool  can  be  used  to  check 
your  arithmetic: 

www.princeton.edu/^rvdb/JAVA/network/nettool/netsimp.html 
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14.1  Consider  the  following  network  flow  problem: 


0  3 


Numbers  shown  above  the  nodes  are  supplies  (negative  values  represent 
demands)  and  numbers  shown  above  the  arcs  are  unit  shipping  costs.  The 
darkened  arcs  form  a  spanning  tree. 

(a)  Compute  primal  flows  for  each  tree  arc. 

(b)  Compute  dual  variables  for  each  node. 

(c)  Compute  dual  slacks  for  each  nontree  arc. 

14.2  Consider  the  tree  solution  for  the  following  minimum  cost  network  flow 
problem: 


The  numbers  on  the  tree  arcs  represent  primal  flows  while  numbers  on  the 
nontree  arcs  are  dual  slacks. 

(a)  Using  the  largest-coefficient  rule  in  the  dual  network  simplex  method, 
what  is  the  leaving  arc? 

(b)  What  is  the  entering  arc? 

(c)  After  one  pivot,  what  is  the  new  tree  solution? 
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14.3  Consider  the  following  network  flow  problem: 


The  numbers  above  the  nodes  are  supplies  (negative  values  represent  de¬ 
mands)  and  numbers  shown  above  the  arcs  are  unit  shipping  costs.  The 
darkened  arcs  form  a  spanning  tree. 

(a)  Compute  primal  flows  for  each  tree  arc. 

(b)  Compute  dual  variables  for  each  node. 

(c)  Compute  dual  slacks  for  each  nontree  arc. 


14.4  Consider  the  tree  solution  for  the  following  minimum  cost  network  flow 
problem: 


The  numbers  on  the  tree  arcs  represent  primal  flows  while  numbers  on  the 
nontree  arcs  are  dual  slacks. 
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(a)  Using  the  largest-coefficient  rule  in  the  primal  network  simplex 
method,  what  is  the  entering  arc? 

(b)  What  is  the  leaving  arc? 

(c)  After  one  pivot,  what  is  the  new  tree  solution? 


14.5  Consider  the  tree  solution  for  the  following  minimum  cost  network  flow 
problem: 


The  numbers  on  the  tree  arcs  represent  primal  flows  while  numbers  on  the 
nontree  arcs  are  dual  slacks. 

(a)  Using  the  largest-coefficient  rule  in  the  dual  network  simplex  method, 
what  is  the  leaving  arc? 

(b)  What  is  the  entering  arc? 

(c)  After  one  pivot,  what  is  the  new  tree  solution? 


14.6  Solve  the  following  network  flow  problem  starting  with  the  spanning  tree 
shown. 


2  -2 


4 


1 
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The  numbers  displayed  next  to  nodes  are  supplies(-f  )/demands(— ).  Num¬ 
bers  on  arcs  are  costs.  Missing  data  should  be  assumed  to  be  zero.  The 
bold  arcs  represent  an  initial  spanning  tree. 

14.7  Solve  Exercise  2.11  using  the  self-dual  network  simplex  method. 

14.8  Using  today’s  date  (MMYY)  for  the  seed  value,  solve  ten  problems  using 
the  network  simplex  pivot  tool: 

www.princeton.edu/~rvdb/JAVA/network/challenge/netsimp.html 


14.9  Consider  the  following  tree  solution  for  a  minimum  cost  network  flow 
problem: 


As  usual,  bold  arcs  represent  arcs  on  the  spanning  tree,  numbers  next  to 
the  bold  arcs  are  primal  flows,  numbers  next  to  non-bold  arcs  are  dual 
slacks,  and  numbers  next  to  nodes  are  dual  variables. 

(a)  For  what  values  of  fi  is  this  tree  solution  optimal? 

(b)  What  are  the  entering  and  leaving  arcs? 

(c)  After  one  pivot,  what  is  the  new  tree  solution? 

(d)  For  what  values  of  fi  is  the  new  tree  solution  optimal? 


14.10  Consider  the  following  tree  solution  for  a  minimum  cost  network  flow 
problem: 


(a)  For  what  values  of  /i  is  this  tree  solution  optimal? 

(b)  What  are  the  entering  and  leaving  arcs? 

(c)  After  one  pivot,  what  is  the  new  tree  solution? 

(d)  For  what  values  of  fi  is  the  new  tree  solution  optimal? 
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14.11  Consider  the  following  minimum  cost  network  flow  problem 

-1  1 


As  usual,  the  numbers  on  the  arcs  represent  the  flow  costs  and  numbers 
at  the  nodes  represent  supplies  (demands  are  shown  as  negative  supplies). 
The  arcs  shown  in  bold  represent  a  spanning  tree.  If  the  solution  cor¬ 
responding  to  this  spanning  tree  is  optimal  prove  it,  otherwise  find  an 
optimal  solution  using  this  tree  as  the  initial  spanning  tree. 

14.12  Suppose  that  a  square  submatrix  of  A  is  invertible.  Show  that  the  arcs 
corresponding  to  the  columns  of  this  submatrix  form  a  spanning  tree. 

14.13  Show  that  a  spanning  tree  on  m  nodes  must  have  exactly  m  —  1  arcs. 

14.14  Define  an  algorithm  that  takes  as  input  a  network  and  either  finds  a  span¬ 
ning  tree  or  proves  that  the  network  is  not  connected. 

14.15  Give  an  example  of  a  minimum-cost  network  flow  problem  with  all  arc 
costs  positive  and  the  following  counterintuitive  property:  if  the  supply 
at  a  particular  source  node  and  the  demand  at  a  particular  sink  node  are 
simultaneously  reduced  by  one  unit,  then  the  optimal  cost  increases. 

14.16  Consider  a  possibly  disconnected  network  (A A).  Two  nodes  i  and  j  in 
A f  are  said  to  be  connected  if  there  is  a  path  from  i  to  j  (recall  that  paths 
can  traverse  arcs  backwards  or  forwards).  We  write  i  ~  j  if  i  and  j  are 
connected. 

(a)  Show  that  defines  an  equivalence  relation.  That  is,  it  has  the 
following  three  properties: 

(i)  (Reflexivity)  for  all  i  E  A/",  i  ~  i\ 

(ii)  (Symmetry)  for  all  i,j  E  A/",  i  ~  j  implies  that  j  ~  i\ 

(iii)  (Transitivity)  for  all  i,  j,  k  E  A f,  i  ~  j  and  j  ~  fc  implies  that 
i  ~  k. 

Using  the  equivalence  relation,  we  can  partition  A f  into  a  collection  of 
subsets  of  equivalence  classes  J\f\ ,  A/2 , . . . ,  A4  such  that  two  nodes  are 
connected  if  and  only  if  they  belong  to  the  same  subset.  The  number  k  is 
called  the  number  of  connected  components. 

(b)  Show  that  the  rank  of  the  node-arc  incidence  matrix  A  is  exactly 
m  —  k  (recall  that  m  is  the  number  of  rows  of  A). 
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Figure  14.16.  The  primal  network  has  nodes  “a”  through  “f”. 
The  corresponding  dual  network  has  nodes  “A”  through  “D”  (node 
“A”  is  “at  infinity”).  A  primal  spanning  tree  is  shown.  It  con¬ 
sists  of  five  arcs:  (a,b),  (f,b),  (b,e),  (e,d),  and  (c,d).  The  corre¬ 
sponding  dual  spanning  tree  consists  of  three  arcs:  (B,A),  (A,C), 
and  (D,A).  Primal  costs  are  shown  along  the  primal  arcs  and  sup¬ 
plies/demands  are  shown  at  the  primal  nodes. 


14.17  One  may  assume  without  loss  of  generality  that  every  node  in  a  minimum 
cost  network  flow  problem  has  at  least  two  arcs  associated  with  it.  Why? 

14.18  The  sum  of  the  dual  slacks  around  any  cycle  is  a  constant.  What  is  that 
constant? 

14.19  Planar  Networks .  A  network  is  called  planar  if  the  nodes  and  arcs  can  be 
laid  out  on  the  two-dimensional  plane  in  such  a  manner  that  no  two  arcs 
cross  each  other  (it  is  allowed  to  draw  the  arcs  as  curves  if  necessary).  All 
of  the  networks  encountered  so  far  in  this  chapter  have  been  planar.  Asso¬ 
ciated  with  each  planar  network  is  a  geometrically  defined  dual  network. 
The  purpose  of  this  problem  is  to  establish  the  following  interesting  fact: 

A  dual  network  simplex  pivot  is  precisely  a  primal  network  sim¬ 
plex  method  applied  to  the  dual  network. 

Viewed  geometrically,  the  nodes  of  a  planar  graph  are  called  vertices 
and  the  arcs  are  called  edges.  Consider  a  specific  connected  planar  net¬ 
work.  If  one  were  to  delete  the  vertices  and  the  edges  from  the  plane, 
one  would  be  left  with  a  disjoint  collection  of  subsets  of  the  plane.  These 
subsets  are  called  faces.  Note  that  there  is  one  unbounded  face.  It  is  a 
face  just  like  the  other  bounded  ones.  An  example  of  a  connected  planar 
network  with  its  faces  labeled  A  through  D  is  shown  in  Figure  14.16. 
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Dual  nodes.  Associated  with  each  connected  planar  network  is  a  dual 
network  defined  by  interchanging  vertices  and  faces.  That  is,  place  a  dual 
vertex  in  the  center  of  each  primal  face.  Note:  the  dual  vertex  corre¬ 
sponding  to  the  unbounded  primal  face  could  be  placed  anywhere  in  the 
unbounded  face  but  we  choose  to  put  it  at  infinity.  In  this  way,  dual  edges 
(defined  next)  that  have  a  head  or  a  tail  at  this  node  can  run  off  to  infinity 
in  any  direction. 

Dual  arcs.  Connect  with  a  dual  edge  any  pair  of  dual  nodes  whose 
corresponding  primal  faces  share  an  edge.  Each  dual  edge  crosses  exactly 
one  primal  edge.  The  directionality  of  the  dual  edge  is  determined  as 
follows:  first,  place  a  vector  along  the  corresponding  primal  edge  pointing 
in  the  direction  of  the  primal  arc,  and  then  rotate  it  counterclockwise  until 
it  is  tangent  to  the  dual  edge.  The  vector  now  defines  the  direction  for  the 
dual  arc. 

Dual  spanning  tree.  Consider  a  spanning  tree  on  the  primal  network 
and  suppose  that  a  primal-dual  tree  solution  is  given.  We  define  a  span¬ 
ning  tree  on  the  dual  network  as  follows.  A  dual  edge  is  on  the  dual 
network’s  spanning  tree  if  and  only  if  the  corresponding  primal  edge  is 
not  on  the  primal  network’s  spanning  tree. 

Dual  flows  and  dual  dual-slacks.  The  numerical  arc  data  for  the  dual 
network  is  inherited  directly  from  the  primal.  That  is,  flows  on  the  dual 
tree  arcs  are  exactly  equal  to  the  dual  slacks  on  the  associated  primal  non¬ 
tree  arcs.  And,  the  dual  slacks  on  the  dual  nontree  arcs  are  exactly  equal 
to  the  primal  flows  on  the  associated  primal  tree  arcs.  Having  specified 
numerical  data  on  the  arcs  of  the  dual  network,  it  is  fairly  straightforward 
to  determine  values  for  supplies/demands  at  the  nodes  and  shipping  costs 
along  the  arcs  that  are  consistent  with  these  numerical  values. 

(a)  Which  of  the  following  networks  are  planar: 


(b)  A  network  is  called  complete  if  there  is  an  arc  between  every  pair 
of  nodes.  If  a  complete  network  with  m  nodes  is  planar,  then  every 
network  with  m  nodes  is  planar.  Prove  it. 

(c)  Show  that  a  nonplanar  network  must  have  five  or  more  nodes. 

(d)  As  always,  let  m  denote  the  number  of  nodes  and  let  n  denote  the 
number  of  arcs  in  a  network.  Let  /  denote  the  number  of  faces  in  a 
planar  network.  Show  by  induction  on  /  that  m  =  n  —  f  +  2. 
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(e)  Show  that  the  dual  spanning  tree  defined  above  is  in  fact  a  spanning 
tree. 

(f)  Show  that  a  dual  pivot  for  a  minimum  cost  network  flow  problem 
defined  on  the  primal  network  is  precisely  the  same  as  a  primal  pivot 
for  the  corresponding  network  flow  problem  on  the  dual  network. 

(g)  Using  the  cost  and  supply/demand  information  given  for  the  primal 
problem  in  Figure  14.16,  write  down  the  primal  problem  as  a  linear 
programming  problem. 

(h)  Write  down  the  dual  linear  programming  problem  that  one  derives 
algebraically  from  the  primal  linear  programming  problem. 

(i)  Using  the  spanning  tree  shown  in  Figure  14.16,  compute  the  primal 
flows,  dual  variables,  and  dual  slacks  for  the  network  flow  problem 
associated  with  the  primal  network. 

(j)  Write  down  the  flow  and  slacks  for  the  network  flow  problem  asso¬ 
ciated  with  the  dual  network. 

(k)  Find  arc  costs  and  node  supplies/demands  for  the  dual  network  that 
are  consistent  with  the  flows  and  slacks  just  computed. 

(l)  Write  down  the  linear  programming  problem  associated  with  the  net¬ 
work  flow  problem  on  the  dual  network. 

Notes 

The  classical  reference  is  Ford  and  Fulkerson  (1962).  More  recent  works  include 
the  books  by  Christofides  (1975),  Lawler  (1976),  Bazaraa  et  al.  (1977), 
Kennington  and  Helgason  (1980),  Jensen  and  Barnes  (1980),  Bertsekas  (1991),  and 
Ahuja  et  al.  (1993). 

The  two  ‘‘original”  algorithms  for  solving  minimum-cost  network  flow  problems 
are  the  network  simplex  method  developed  by  Dantzig  (1951a)  and  the  primal-dual 
method  developed  by  Ford  and  Fulkerson  (1958).  The  self-dual  algorithm  described 
in  this  chapter  is  neither  of  these.  In  fact,  it  resembles  the  “out-of-kilter”  method 
described  by  Ford  and  Fulkerson  (1962). 


CHAPTER  15 


Applications 


In  this  chapter,  we  discuss  briefly  the  most  important  applications  of  network 
flow  problems. 


1.  The  Transportation  Problem 

The  network  flow  problem,  when  thought  of  as  representing  the  shipment  of 
goods  along  a  transportation  network,  is  called  the  transshipment  problem.  An  im¬ 
portant  special  case  is  when  the  set  of  nodes  A f  can  be  partitioned  into  two  sets  S 
and  V, 

AT  =  suv,  snv  =  V) , 

such  that  every  arc  in  A  has  its  tail  in  S  and  its  head  in  V.  The  nodes  in  S  are  called 
source  ( or  supply)  nodes ,  while  those  in  V  are  called  destination  ( or  demand)  nodes. 
Such  graphs  are  called  bipartite  graphs  (see  Figure  15.1).  A  network  flow  problem 
on  such  a  bipartite  graph  is  called  a  transportation  problem. 

In  order  for  a  transportation  problem  to  be  feasible,  the  supply  must  be  nonneg¬ 
ative  at  every  supply  node,  and  the  demand  must  be  nonnegative  at  every  demand 
node.  That  is, 

bi  >  0  for  i  E  S , 
bi  <  0  for  i  £  V. 

When  put  on  paper,  a  bipartite  graph  has  the  annoying  property  that  the  arcs 
tend  to  cross  each  other  many  times.  This  makes  such  a  representation  inconvenient 
for  carrying  out  the  steps  of  the  network  simplex  method.  But  there  is  a  nice,  un¬ 
cluttered,  tabular  representation  of  a  bipartite  graph  that  one  can  use  when  applying 
the  simplex  method.  To  discover  this  tabular  representation,  first  suppose  that  the 
graph  is  laid  out  as  shown  in  Figure  15.2.  Now  if  we  place  the  supplies  and  de¬ 
mands  on  the  nodes  and  the  costs  at  the  kinks  in  the  arcs,  then  we  get,  for  example, 
the  following  simple  tabular  representation  of  a  transportation  problem: 
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(15.1) 
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(the  asterisks  represent  nonexistent  arcs).  The  iterations  of  the  simplex  method  can 
be  written  in  this  tabular  format  by  simply  placing  the  dual  variables  where  the 
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Supply  Demand 

nodes  nodes 


Figure  15.1.  A  bipartite  graph — the  network  for  a  transportation  problem. 


Figure  15.2.  The  bipartite  graph  from  Figure  15.1  laid  out  in 
a  rectangular  fashion,  with  supplies  and  demands  given  at  the 
nodes,  and  with  costs  given  on  the  arcs. 


supplies  and  demands  are  and  by  placing  the  primal  flows  and  dual  slacks  where 
the  arc  costs  are.  Of  course,  some  notation  needs  to  be  introduced  to  indicate  which 
cells  are  part  of  the  current  spanning  tree.  For  example,  the  tree  could  be  indicated 
by  putting  a  box  around  the  primal  flow  values.  Here  is  a  (nonoptimal)  tree  solution 
for  the  data  given  above: 
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(the  solution  to  this  problem  is  left  as  an  exercise). 

In  the  case  where  every  supply  node  is  connected  to  every  demand  node,  the 
problem  is  called  the  Hitchcock  Transportation  Problem.  In  this  case,  the  equations 
defining  the  problem  are  especially  simple.  Indeed,  if  we  denote  the  supplies  at  the 
supply  nodes  by  r^i  G  S ,  and  if  we  denote  the  demands  at  the  demand  nodes  by 
Sj ,  j  G  V,  then  we  can  write  the  problem  as 


minimize 

y  y  °ijxij 

ies  jev 

subject  to 

=  g 

i  e  S 

jev 

yy  xv =  sj 

j  g  V 

ies 

& 

<s>. 

IV 

o 

i  e  S,  j  e  V. 

2.  The  Assignment  Problem 


Given  a  set  S  of  m  people,  a  set  V  of  m  tasks,  and  for  each  i  G  S,  j  G  V  a  cost 
Cij  associated  with  assigning  person  i  to  task  j,  the  assignment  problem  is  to  assign 
each  person  to  one  and  only  one  task  in  such  a  manner  that  each  task  gets  covered 
by  someone  and  the  total  cost  of  the  assignments  is  minimized.  If  we  let 


J  1  if  person  i  is  assigned  task  j, 
X ^  \  0  otherwise, 

then  the  objective  function  can  be  written  as 


minimize  yy  yy  CijXij. 
ies  jev 

The  constraint  that  each  person  is  assigned  exactly  one  task  can  be  expressed 
simply  as 

Xii  =  for  all  i  G  S. 

jev 

Also,  the  constraint  that  every  task  gets  covered  by  someone  is  just 

yy =  i,  for  all  j  G  T>. 

ies 

Except  for  the  assumed  integrality  of  the  decision  variables,  Xij,  the  assignment 
problem  is  just  a  Hitchcock  transportation  problem  in  which  the  supply  at  every  sup¬ 
ply  node  (person)  is  one  and  the  demand  at  every  demand  node  (task)  is  also  one. 
This  Hitchcock  transportation  problem  therefore  is  called  the  LP -relaxation  of  the 
assignment  problem.  It  is  easy  to  see  that  every  feasible  solution  to  the  assignment 
problem  is  a  feasible  solution  for  its  LP-relaxation.  Furthermore,  every  integral  fea¬ 
sible  solution  to  the  LP-relaxation  is  a  feasible  solution  to  the  assignment  problem. 
Since  the  network  simplex  method  applied  to  the  LP-relaxation  produces  an  inte¬ 
gral  solution,  it  therefore  follows  that  the  method  solves  not  only  the  LP-relaxation 
but  also  the  assignment  problem  itself.  We  should  note  that  this  is  a  very  special 
and  important  feature  of  the  network  simplex  method.  For  example,  had  we  used 
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the  primal-dual  interior-point  method  to  solve  the  LP-relaxation,  there  would  be 
no  guarantee  that  the  solution  obtained  would  be  integral  (unless  the  problem  has  a 
unique  optimal  solution,  in  which  case  any  LP  solver  would  find  the  same,  integral 
answer — but  typical  assignment  problems  have  alternate  optimal  solutions,  and  an 
interior-point  method  will  report  a  convex  combination  of  all  of  them). 

3.  The  Shortest-Path  Problem 

Roughly  speaking,  the  shortest-path  problem  is  to  find,  well,  the  shortest  path 
from  one  specific  node  to  another  in  a  network  (A /,  A) .  In  contrast  to  earlier  usage, 
the  arcs  connecting  successive  nodes  on  a  path  must  point  in  the  direction  of  travel. 
Such  paths  are  sometimes  referred  to  as  directed  paths.  To  determine  a  shortest  path, 
we  assume  that  we  are  given  the  length  of  each  arc.  To  be  consistent  with  earlier 
notations,  let  us  assume  that  the  length  of  arc  (i,  j)  is  denoted  by  Cij.  Naturally,  we 
assume  that  these  lengths  are  nonnegative. 

To  find  the  shortest  path  from  one  node  (say,  s )  to  another  (say,  r),  we  will 
see  that  it  is  necessary  to  compute  the  shortest  path  from  many,  perhaps  all,  other 
nodes  to  r.  Hence,  we  define  the  shortest-path  problem  as  the  problem  of  finding 
the  shortest  path  from  every  node  in  J\f  to  a  specific  node  r  <E  AT.  The  destination 
node  r  is  called  the  root  node. 

3.1.  Network  Flow  Formulation.  The  shortest-path  problem  can  be  formu¬ 
lated  as  a  network  flow  problem.  Indeed,  put  a  supply  of  one  unit  at  each  nonroot 
node,  and  put  the  appropriate  amount  of  demand  at  the  root  (to  meet  the  total  sup¬ 
ply).  The  cost  on  each  arc  is  just  the  length  of  the  arc.  Suppose  that  we’ve  solved 
this  network  flow  problem.  Then  the  shortest  path  from  a  node  i  to  r  can  be  found 
by  simply  following  the  arcs  from  i  to  r  on  the  optimal  spanning  tree.  Also,  the 
length  of  the  shortest  path  is  y *  —  y\ . 

While  the  network  simplex  method  can  be  used  to  solve  the  shortest-path  prob¬ 
lem,  there  are  faster  algorithms  designed  especially  for  it.  To  describe  these  algo¬ 
rithms,  let  us  denote  the  distance  from  i  to  r  by  V{.  These  distances  (or  approxima¬ 
tions  thereof)  are  called  labels  in  the  networks  literature.  Some  algorithms  compute 
these  distances  systematically  in  a  certain  order.  These  algorithms  are  called  label¬ 
setting  algorithms.  Other  algorithms  start  with  an  estimate  for  these  labels  and  then 
iteratively  correct  the  estimates  until  the  optimal  values  are  found.  Such  algorithms 
are  called  label-correcting  algorithms. 

Note  that  if  we  set  y *  to  zero  in  the  network  flow  solution,  then  the  labels  are 
simply  the  negative  of  the  optimal  dual  variables.  In  the  following  subsections,  we 
shall  describe  simple  examples  of  label- setting  and  label-correcting  algorithms. 

3.2.  A  Label-Correcting  Algorithm.  To  describe  a  label-correcting 
algorithm,  we  need  to  identify  a  system  of  equations  that  characterize  the  shortest- 
path  distances.  First  of  all,  clearly 


vr  =  0. 
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What  can  we  say  about  the  labels  at  other  nodes,  say,  node  il  Suppose  that  we  select 
an  arc  (i,j)  that  leaves  node  i.  If  we  were  to  travel  along  this  arc  and  then,  from 
node  j,  travel  along  the  shortest  path  to  r,  then  the  distance  to  the  root  would  be 
°ij  +  Vj.  So,  from  node  i,  we  should  select  the  arc  that  minimizes  these  distances. 
This  selection  will  then  give  the  shortest  distance  from  i  to  r.  That  is, 

(15.3)  Vi  =  minjc^j  +  Vj  :  (i,j)  G  A },  i  r. 

The  argument  we  have  just  made  is  called  the  principle  of  dynamic  programming , 
and  equation  (15.3)  is  called  Bellman’s  equation .  Dynamic  programming  is  a  whole 
subject  of  its  own — we  shall  only  illustrate  some  of  its  basic  ideas  by  our  study  of 
the  shortest-path  problem.  In  the  dynamic  programming  literature,  the  set  of  vfs 
viewed  as  a  function  defined  on  the  nodes  is  called  the  value  function  (hence  the 
notation). 

From  Bellman’s  equation,  it  is  easy  to  identify  the  arcs  one  would  travel  on  in 
a  shortest-path  route  to  the  root.  Indeed,  these  arcs  are  given  by 


T  =  {( i,j )  G  A:Vi  =  Cij  +  Vj}. 


This  set  of  arcs  may  contain  alternate  shortest  paths  to  the  root,  and  so  the  set  is  not 
necessarily  a  tree.  Nonetheless,  any  path  that  follows  these  arcs  will  get  to  the  root 
on  a  shortest-path  route. 

3.2.1.  Method  of  Successive  Approximation.  Bellman’s  equation  is  an  implicit 
system  of  equations  for  the  values  i  G  AT.  Implicit  equations  such  as  this  arise 
frequently  and  beg  to  be  solved  by  starting  with  a  guess  at  the  solution,  using  this 
guess  in  the  right-hand  side,  and  computing  a  new  guess  by  evaluating  the  right- 
hand  side.  This  approach  is  called  the  method  of  successive  approximations.  To 
apply  it  to  the  shortest-path  problem,  we  initialize  the  labels  as  follows: 

v(°)  =  / 0  i  =  r 

1  \  oo  i^r. 


Then  the  updates  are  computed  using  Bellman’s  equation: 


vi 


(fc+i)  _ 


0 

min {c^  +  vf  :  (i,j)  G  A} 


i  =  r 
i  r 


3.2.2.  Efficiency.  The  algorithm  stops  when  an  update  leaves  all  the  vfs  un¬ 
changed.  It  turns  out  that  the  algorithm  is  guaranteed  to  stop  in  no  more  than  m 

iterations.  To  see  why,  it  suffices  to  note  that  v\  has  a  very  simple  description: 
it  is  the  length  of  the  shortest  path  from  i  to  r  that  has  k  or  fewer  arcs  in  the  path. 
(It  is  not  hard  to  convince  yourself  with  an  induction  on  k  that  this  is  correct,  but 
a  pedantic  proof  requires  introducing  a  significant  amount  of  added  notation  that 
we  wish  to  avoid.)  Hence,  the  label-correcting  algorithm  cannot  take  more  than  m 
iterations,  since  every  shortest  path  can  visit  each  node  at  most  once.  Since  each 
iteration  involves  looking  at  every  arc  of  the  network,  it  follows  that  the  number 
of  additions/comparisons  needed  to  solve  a  shortest-path  problem  using  the  label- 
correcting  algorithm  is  about  nm. 
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V3  = 


J  =  r, 
j  ^  r. 


Initialize: 

^  =  0 

0 

oo 

while  (| Tc\  >  0){ 

j  =  argminj^  :  k  0  J7} 

for  each  i  for  which  (i,  j )  £  A  and  i  %  T  { 

if  (ci:j  +Vj  <  { 

Vi  —  C/j  Vj 

hi  =  j 

} 


} 


} 


Figure  15.3.  Dijkstra’s  shortest-path  algorithm. 


3.3.  A  Label-Setting  Algorithm.  In  this  section,  we  describe  Dijkstra  ’s  algo¬ 
rithm  for  solving  shortest-path  problems.  The  data  structures  that  are  carried  from 
one  iteration  to  the  next  are  a  set  T  of  finished  nodes  and  two  arrays  indexed  by  the 
nodes  of  the  graph.  The  first  array,  Vj,  j  G  A f,  is  just  the  array  of  labels.  The  second 
array,  hi,  i  G  AT,  indicates  the  next  node  to  visit  from  node  i  in  a  shortest  path.  As 
the  algorithm  proceeds,  the  set  T  contains  those  nodes  for  which  the  shortest  path 
has  already  been  found.  This  set  starts  out  empty.  Each  iteration  of  the  algorithm 
adds  one  node  to  it.  This  is  why  the  algorithm  is  called  a  label-setting  algorithm, 
since  each  iteration  sets  one  label  to  its  optimal  value.  For  finished  nodes,  the  labels 
are  fixed  at  their  optimal  values.  For  each  unfinished  node,  the  label  has  a  temporary 
value,  which  represents  the  length  of  the  shortest  path  from  that  node  to  the  root, 
subject  to  the  condition  that  all  intermediate  nodes  on  the  path  must  be  finished 
nodes.  At  those  nodes  for  which  no  such  path  exists,  the  temporary  label  is  set  to 
infinity  (or,  in  practice,  a  large  positive  number). 

The  algorithm  is  initialized  by  setting  all  the  labels  to  infinity  except  for  the 
root  node,  whose  label  is  set  to  0.  Also,  the  set  of  finished  nodes  is  initialized 
to  the  empty  set.  Then,  as  long  as  there  remain  unfinished  nodes,  the  algorithm 
selects  an  unfinished  node  j  having  the  smallest  temporary  label,  adds  it  to  the  set  of 
finished  nodes,  and  then  updates  each  unfinished  “upstream”  neighbor  i  by  setting 
its  label  to  cij  +  Vj  if  this  value  is  smaller  than  the  current  value  vi.  For  each 
neighbor  i  whose  label  gets  changed,  hi  is  set  to  j.  The  algorithm  is  summarized  in 
Figure  15.3. 
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4.  Upper-Bounded  Network  Flow  Problems 

Some  real-world  network  flow  problems  involve  upper  bounds  on  the  amount 
of  flow  that  an  arc  can  handle.  There  are  modified  versions  of  the  network  simplex 
method  that  allow  one  to  handle  such  upper  bounds  implicitly,  but  we  shall  simply 
show  how  to  reduce  an  upper-bounded  network  flow  problem  to  one  without  upper 
bounds. 

Let  us  consider  just  one  arc,  (i,  j),  in  a  network  flow  problem.  Suppose  that 
there  is  an  upper  bound  of  uij  on  the  amount  of  flow  that  this  arc  can  handle.  We 
can  express  this  bound  as  an  extra  constraint: 


0  <  Xij  <  u^. 

Introducing  a  slack  variable,  Lj ,  we  can  rewrite  these  bound  constraints  as 


Xij  T  tij  —  Uij 

Xij , 

If  we  look  at  the  flow  balance  constraints  and  focus  our  attention  on  the  variables 
and  t^,  we  see  that  they  appear  in  only  three  constraints:  the  flow  balance 
constraints  for  nodes  i  and  j  and  the  upper  bound  constraint, 

•  •  •  - T*  •  •  •  •  •  - /l  • 

Xij  ~^~^ij  —  Uij. 

If  we  subtract  the  last  constraint  from  the  second  one,  we  get 


>  0. 


xij 


Note  that  we  have  restored  a  network  structure  in  the  sense  that  each  column  again 
has  one  +1  and  one  —1  coefficient.  To  make  a  network  picture,  we  need  to  create 
a  new  node  (corresponding  to  the  third  row).  Let  us  call  this  node  k.  The  network 
transformation  is  shown  in  Figure  15.4. 

We  can  use  the  above  transformation  to  derive  optimality  conditions  for  upper- 
bounded  network  flow  problems.  Indeed,  let  us  consider  an  optimal  solution  to 
the  transformed  problem.  Clearly,  if  x^  is  zero,  then  the  corresponding  dual  slack 

Zik  =Vi  +  (kj  ~  Vk  is  nonnegative: 


(15.4) 


Vi  H-  Cij  Vk  ^  0* 


Furthermore,  the  back-flow  Xjk  must  be  at  the  upper  bound  rate: 
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Figure  15.4.  Adding  a  new  node,  k,  to  accommodate  an  arc 
(i,j)  having  an  upper  bound  Ui3  on  its  flow  capacity. 


Hence,  by  complementarity,  the  corresponding  dual  slack  must  vanish: 

(15.5)  zjk  =  Vj  ~  Vk  =  0. 

Combining  (15.4)  with  (15.5),  we  see  that 

Vi  4~  Qj  ^  Vj  • 

On  the  other  hand,  if  the  flow  on  arc  (z,  k)  is  at  the  capacity  value,  then  the  back-flow 
on  arc  (j,  k)  must  vanish.  The  complementarity  conditions  then  say  that 

%ik  —  Vi  4“  Cij  yk  —  0 
%jk  Vj  Vk  ~  0* 

Combining  these  two  statements,  we  get 

Vi  4"  Cij  ^  yj . 

Finally,  if  0  <  Xij  <  Uij ,  then  both  slack  variables  vanish,  and  this  implies  that 

Vi  4~  Cij  =  yj . 

These  properties  can  then  be  summarized  as  follows: 

%ij  —  0  ^  y%  4~  Cij  ^  yj 

(15.6)  =>  yi  +  <  y3 

0  <C  Xij  <C  Uij  V  yi  Cij  —  yj . 

While  upper-bounded  network  flow  problems  have  important  applications,  we 
admit  that  our  main  interest  in  them  is  more  narrowly  focused.  It  stems  from  their 
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Figure  15.5.  A  cut  set  C  for  a  maximum  flow  problem. 


relation  to  an  important  theorem  called  the  Max-Flow  Min-Cut  Theorem.  We  shall 
state  and  prove  this  theorem  in  the  next  section.  The  only  tool  we  need  to  prove  this 
theorem  is  the  above  result  giving  the  complementarity  conditions  when  there  are 
upper  bounds  on  arcs.  So  on  with  the  show. 


5.  The  Maximum-Flow  Problem 

The  subject  of  this  section  is  the  class  of  problems  called  maximum-flow  prob¬ 
lems.  These  problems  form  an  important  topic  in  the  theory  of  network  flows.  There 
are  very  efficient  algorithms  for  solving  them,  and  they  appear  as  subproblems  in 
many  algorithms  for  the  general  network  flow  problem.  However,  our  aim  is  rather 
modest.  We  wish  only  to  expose  the  reader  to  one  important  theorem  in  this  subject, 
which  is  called  the  Max-Flow  Min-Cut  Theorem. 

Before  we  can  state  this  theorem  we  need  to  set  up  the  situation.  Suppose  that 
we  are  given  a  network  (A f,  A ),  a  distinguished  node  s  <E  Af  called  the  source  node , 
a  distinguished  node  t  G  Af  called  the  sink  node ,  and  upper  bounds  on  the  arcs  of 
the  network  Uij,  (i,j)  G  A.  For  simplicity,  we  shall  assume  that  the  upper  bounds 
are  all  finite  (although  this  is  not  really  necessary).  The  objective  is  to  “push”  as 
much  flow  from  s  to  t  as  possible. 

To  solve  this  problem,  we  can  convert  it  to  an  upper-bounded  network  flow 
problem  as  follows.  First,  let  cij  =  0  for  all  arcs  (i,j)  G  A,  and  let  bi  =  0  for  every 
node  i  G  Af.  Then  add  one  extra  arc  (t,s)  connecting  the  sink  node  t  back  to  the 
source  node  s ,  put  a  negative  cost  on  this  arc  (say,  cts  =  —1),  and  let  it  have  infinite 
capacity  uts  =  oo.  Since  the  only  nonzero  cost  is  actually  negative,  it  follows  that 
we  shall  actually  make  a  profit  by  letting  more  and  more  flow  circulate  through  the 
network.  But  the  upper  bound  on  the  arc  capacities  limits  the  amount  of  flow  that  it 
is  possible  to  push  through. 

In  order  to  state  the  Max-Flow  Min-Cut  Theorem,  we  must  define  what 
we  mean  by  a  cut.  A  cut ,  C,  is  a  set  of  nodes  that  contains  the  source  node  but 
does  not  contain  the  sink  node  (see  Figure  15.5).  The  capacity  of  a  cut  is  defined  as 
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tv(c)  —  Uij. 

iec 

j^c 

Note  that  here  and  elsewhere  in  this  section,  the  summations  are  over  “original”  arcs 
that  satisfy  the  indicated  set  membership  conditions.  That  is,  they  don’t  include  the 
arc  that  we  added  connecting  from  t  back  to  8.  (If  it  did,  the  capacity  of  every  cut 
would  be  infinite — which  is  clearly  not  our  intention.) 

Flow  balance  tells  us  that  the  total  flow  along  original  arcs  connecting  the  cut 
set  C  to  its  complement  minus  the  total  flow  along  original  arcs  that  span  these  two 
sets  in  the  opposite  direction  must  equal  the  amount  of  flow  on  the  artificial  arc 
(£,  s).  That  is, 


(15.7) 


iec  igc 

j^c  jec 


We  are  now  ready  to  state  the  Max-Flow  Min-Cut  Theorem. 

Theorem  15.1.  The  maximum  value  ofxts  equals  the  minimum  value  ofu(C). 

Proof.  The  proof  follows  the  usual  sort  of  pattern  common  in  subjects  where 
there  is  a  sort  of  duality  theory.  First  of  all,  we  note  that  it  follows  from  (15.7)  that 


(15.8) 


Xts  <  k{C) 


for  every  feasible  flow  and  every  cut  set  (7.  Then  all  that  is  required  is  to  exhibit  a 
feasible  flow  and  a  cut  set  for  which  this  inequality  is  an  equality. 

Let  x*j,  (i,j)  G  A,  denote  the  optimal  values  of  the  primal  variables,  and  let  y*, 
i  G  A f,  denote  the  optimal  values  of  the  dual  variables.  Then  the  complementarity 
conditions  (15.6)  imply  that 


(15.9)  x*j  =  0  whenever  y*  +  c^-  >  y* 

(15.10)  x*j  =  whenever  y*  +  Cij  <  y*. 

In  particular, 

Vt  ~1>  Vs 

(since  uts  =  oo).  Put  (7*  =  {k  :  y\  <  y*}.  Clearly,  C*  is  a  cut. 

Consider  an  arc  having  its  tail  in  C*  and  its  head  in  the  complement  of  C* .  It 
follows  from  the  definition  of  C*  that  y*  <  <  y*.  Since  is  zero,  we  see 

from  (15.10)  that  x*3  =  uij. 

Now  consider  an  original  arc  having  its  tail  in  the  complement  of  (7*  and  its 
head  in  (7*  (i.e.,  bridging  the  two  sets  in  the  opposite  direction).  It  follows  then  that 
y*j  <y*s  <  y*-  Hence,  we  see  from  (15.9)  that  x*3  =  0. 
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Combining  the  observations  of  the  last  two  paragraphs  with  (15.7),  we  see  that 

xts  =  Y  uv  =  «(<?*)• 

iec 

j?c 

In  light  of  (15.8),  the  proof  is  complete.  □ 

Exercises 

15.1  Solve  the  transportation  problem  given  in  (15. 1),  using  (15.2)  for  the  start¬ 
ing  tree  solution. 

15.2  Solve  the  following  linear  programming  problem: 


maximize 

7x\ 

— 

3X2  + 9^3  + 

to 

subject  to 

X\ 

+ 

x2 

< 

1 

x3  + 

X4 

< 

1 

X\ 

+  x3 

> 

1 

x2  + 

X4 

> 

1 

Xl,  x2,  x3: 

>  XA 

> 

0 

(Note:  there  are  two  greater-than-or-equal-to  constraints.) 

15.3  Bob,  Carol,  David,  and  Alice  are  stranded  on  a  desert  island.  Bob  and 
David  each  would  like  to  give  their  affection  to  Carol  or  to  Alice.  Food 
is  the  currency  of  trade  for  this  starving  foursome.  Bob  is  willing  to  pay 
Carol  7  clams  if  she  will  accept  his  affection.  David  is  even  more  keen 
and  is  willing  to  give  Carol  9  clams  if  she  will  accept  it.  Both  Bob  and 
David  prefer  Carol  to  Alice  (sorry  Alice).  To  quantify  this  preference, 
David  is  willing  to  pay  Alice  only  2  clams  for  his  affection.  Bob  is  even 
more  averse:  he  says  that  Alice  would  have  to  pay  him  for  it.  In  fact, 
she’d  have  to  pay  him  3  clams  for  his  affection.  Carol  and  Alice,  being 
proper  young  women,  will  accept  affection  from  one  and  only  one  of  the 
two  guys.  Between  the  two  of  them  they  have  decided  to  share  the  clams 
equally  between  them  and  hence  their  objective  is  simply  to  maximize 
the  total  number  of  clams  they  will  receive.  Formulate  this  problem  as  a 
transportation  problem.  Solve  it. 

15.4  Project  Scheduling. This  problem  deals  with  the  creation  of  a  project  sched¬ 
ule;  specifically,  the  project  of  building  a  house.  The  project  has  been 
divided  into  a  set  of  jobs.  The  problem  is  to  schedule  the  time  at  which 
each  of  these  jobs  should  start  and  also  to  predict  how  long  the  project 
will  take.  Naturally,  the  objective  is  to  complete  the  project  as  quickly  as 
possible  (time  is  money!).  Over  the  duration  of  the  project,  some  of  the 
jobs  can  be  done  concurrently.  But,  as  the  following  table  shows,  certain 
jobs  definitely  can’t  start  until  others  are  completed. 
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Job 

Duration 

(weeks) 

Must  be 
preceded  by 

0.  Sign  contract  with  buyer 

0 

— 

1.  Framing 

2 

0 

2.  Roofing 

1 

1 

3.  Siding 

3 

1 

4.  Windows 

2.5 

3 

5.  Plumbing 

1.5 

3 

6.  Electrical 

2 

2,4 

7.  Inside  finishing 

4 

5,6 

8.  Outside  painting 

3 

2,4 

9.  Complete  the  sale  to  buyer 

0 

7,8 

One  possible  schedule  is  the  following: 


Job 

Start  time 

0.  Sign  contract  with  buyer 

0 

1.  Framing 

1 

2.  Roofing 

4 

3.  Siding 

6 

4.  Windows 

10 

5.  Plumbing 

9 

6.  Electrical 

13 

7.  Inside  finishing 

16 

8.  Outside  painting 

14 

9.  Complete  the  sale  to  buyer 

21 

With  this  schedule,  the  project  duration  is  21  weeks  ( the  difference 
between  the  start  times  of  jobs  9  and  0). 

To  model  the  problem  as  a  linear  program,  introduce  the  following 
decision  variables: 

tj  =  the  start  time  of  job  j. 

(a)  Write  an  expression  for  the  objective  function,  which  is  to  minimize 
the  project  duration. 

(b)  For  each  job  j,  write  a  constraint  for  each  job  i  that  must  precede 
j ;  the  constraint  should  ensure  that  job  j  doesn’t  start  until  job  i  is 
finished.  These  are  called  precedence  constraints. 

15.5  Continuation.  This  problem  generalizes  the  specific  example  of  the  pre¬ 
vious  problem.  A  project  consists  of  a  set  of  jobs  J .  For  each  job  j  E  J 
there  is  a  certain  set  V3  of  other  jobs  that  must  be  completed  before  job  j 
can  be  started.  (This  is  called  the  set  of  predecessors  of  job  j.)  One  of  the 
jobs,  say  8,  is  the  starting  job;  it  has  no  predecessors.  Another  job,  say  t, 
is  the  final  (or  terminal)  job;  it  is  not  the  predecessor  of  any  other  job.  The 
time  it  will  take  to  do  job  j  is  denoted  dj  (the  duration  of  the  job). 
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The  problem  is  to  decide  what  time  each  job  should  begin  so  that 
no  job  begins  before  its  predecessors  are  finished,  and  the  duration  of  the 
entire  project  is  minimized.  Using  the  notations  introduced  above,  write 
out  a  complete  description  of  this  linear  programming  problem. 

15.6  Continuation.  Let  Xij  denote  the  dual  variable  corresponding  to  the  prece¬ 
dence  constraint  that  ensures  job  j  doesn’t  start  until  job  i  finishes. 

(a)  Write  out  the  dual  to  the  specific  linear  program  in  Problem  15.4. 

(b)  Write  out  the  dual  to  the  general  linear  program  in  Problem  15.5. 

(c)  Describe  how  the  optimal  value  of  the  dual  variable  can  be  inter¬ 
preted. 

15.7  Continuation.  The  project  scheduling  problem  can  be  represented  on  a 
directed  graph  with  arc  weights  as  follows.  The  nodes  of  the  graph  corre¬ 
spond  to  the  jobs.  The  arcs  correspond  to  the  precedence  relations.  That 
is,  if  job  i  must  be  completed  before  job  j,  then  there  is  an  arc  pointing 
from  node  i  to  node  j.  The  weight  on  this  arc  is  di. 

(a)  Draw  the  directed  graph  associated  with  the  example  in  Problem  15.4, 
being  sure  to  label  the  nodes  and  write  the  weights  beside  the  arcs. 

(b)  Return  to  the  formulation  of  the  dual  from  Problem  15.6(a).  Give 
an  interpretation  of  that  dual  problem  in  terms  of  the  directed  graph 
drawn  in  Part  (a). 

(c)  Explain  why  there  is  always  an  optimal  solution  to  the  dual  problem 
in  which  each  variable  is  either  0  or  1 . 

(d)  Write  out  the  complementary  slackness  condition  corresponding  to 
dual  variable  X20 . 

(e)  Describe  the  dual  problem  in  the  language  of  the  original  project 
scheduling  model. 

15.8  Continuation.  Here  is  an  algorithm  for  computing  optimal  start  times  ty. 

1.  List  the  jobs  so  that  the  predecessors  of  each  job  come 
before  it  in  the  list. 

2.  Put  to  =  0. 

3.  Go  down  the  list  of  jobs  and  for  job  j  put  t3  =  max{L  + 
di  :  i  is  a  predecessor  of  j}. 

(a)  Apply  this  algorithm  to  the  specific  instance  from  Problem  15.4. 
What  are  the  start  times  of  each  of  the  jobs?  What  is  the  project 
duration? 

(b)  Prove  that  the  solution  found  in  Part  (a)  is  optimal  by  exhibiting  a 
corresponding  dual  solution  and  checking  the  usual  conditions  for 
optimality  {Hint:  The  complementary  slackness  conditions  may  help 
you  find  a  dual  solution.). 

15.9  Currency  Arbitrage.  Consider  the  world’s  currency  market.  Given  two 
currencies,  say  the  Japanese  Yen  and  the  US  Dollar,  there  is  an  exchange 
rate  between  them  (currently  about  110  Yen  to  the  Dollar).  It  is  always 
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true  that,  if  you  convert  money  from  one  currency  to  another  and  then 
back,  you  will  end  up  with  less  than  you  started  with.  That  is,  the  product 
of  the  exchange  rates  between  any  pair  of  countries  is  always  less  than 
one.  However,  it  sometimes  happens  that  a  longer  chain  of  conversions 
results  in  a  gain.  Such  a  lucky  situation  is  called  an  arbitrage.  One  can 
use  a  linear  programming  model  to  find  such  situations  when  they  exist. 

Consider  the  following  table  of  exchange  rates  (which  is  actual  data 
from  the  Wall  Street  Journal  on  Nov  10,  1996): 


param 

rate : 

USD  Yen 

Mark 

Franc 

USD 

111.52 

1.4987 

5 . 0852 

Yen 

.008966  . 

.013493 

. 045593 

Mark 

.6659  73.964 

• 

3 .3823 

Franc 

.1966  21.933 

.29507 

• 

/ 


It  is  not  obvious,  but  the  USD— ^Yen— ^Mark— ^USD  conversion  actually 
makes  $0,002  on  each  initial  dollar. 

To  look  for  arbitrage  possibilities,  one  can  make  a  generalized  net¬ 
work  model ,  which  is  a  network  flow  model  with  the  unusual  twist  that  a 
unit  of  flow  that  leaves  one  node  arrives  at  the  next  node  multiplied  by  a 
scale  factor — in  our  example,  the  currency  conversion  rate.  For  us,  each 
currency  is  represented  by  a  node.  There  is  an  arc  from  each  node  to  ev¬ 
ery  other  node.  A  flow  of  one  unit  out  of  one  node  becomes  a  flow  of  a 
different  magnitude  at  the  head  node.  For  example,  one  dollar  flowing  out 
of  the  USD  node  arrives  at  the  Franc  node  as  5.0852  Francs. 

Let  xij  denote  the  flow  from  node  (i.e.  currency)  i  to  node  j.  This 
flow  is  measured  in  the  currency  of  node  i. 

One  node  is  special;  it  is  the  home  node,  say  the  US  Dollars  (USD) 
node.  At  all  other  nodes,  there  must  be  flow  balance. 

(a)  Write  down  the  flow  balance  constraints  at  the  3  non-home  nodes 
(Franc,  Yen,  and  Mark). 

At  the  home  node,  we  assume  that  there  is  a  supply  of  one  unit  (to  get 
things  started).  Furthermore,  at  this  node,  flow  balance  will  not  be  satis¬ 
fied.  Instead  one  expects  a  net  inflow.  If  it  is  possible  to  make  this  inflow 
greater  than  one,  then  an  arbitrage  has  been  found.  Let  /  be  a  variable 
that  represents  this  inflow. 

(b)  Using  variable  /  to  represent  net  inflow  to  the  home  node,  write  a 
flow  balance  equation  for  the  home  node. 
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Of  course,  the  primal  objective  is  to  maximize  /. 

(c)  Using  yi  to  represent  the  dual  variable  associated  with  the  primal 
constraint  for  currency  i,  write  down  the  dual  linear  program.  (Re¬ 
gard  the  primal  variable  /  as  a  free  variable.) 

Now  consider  the  general  case,  which  might  involve  hundreds  of  curren¬ 
cies  worldwide. 

(d)  Write  down  the  model  mathematically  using  for  the  flow  leaving 
node  i  heading  for  node  j  (measured  in  the  currency  of  node  i), 
for  the  exchange  rate  when  converting  from  currency  i  to  currency  j, 
and  /  for  the  net  inflow  at  the  home  node  i*. 

(e)  Write  down  the  dual  problem. 

(f)  Can  you  give  an  interpretation  for  the  dual  variables?  Hint:  It  might 
be  helpful  to  think  about  the  case  where  rji  =  1  / for  all  i,  j. 

(g)  Comment  on  the  conditions  under  which  your  model  will  be  un¬ 
bounded  and/or  infeasible. 

Notes 

The  Hitchcock  problem  was  introduced  by  Hitchcock  (1941).  Dijkstra’s  algo¬ 
rithm  was  discovered  by  Dijkstra  (1959). 

The  Max-Flow  Min-Cut  Theorem  was  proved  independently  by  Elias  et  al. 
(1956),  by  Ford  and  Fulkerson  (1956)  and,  in  the  restricted  case  where  the  upper 
bounds  are  all  integers,  by  Kotzig  (1956).  Fulkerson  and  Dantzig  (1955)  also  proved 
the  Max-Flow  Min-Cut  Theorem.  Their  proof  uses  duality,  which  is  particularly  rel¬ 
evant  to  this  chapter. 

The  classic  references  for  dynamic  programming  are  the  books  by  Bellman 
(1957)  and  Howard  (1960).  Further  discussion  of  label- setting  and  label-correcting 
algorithms  can  be  found  in  the  book  by  Ahuja  et  al.  (1993). 


CHAPTER  16 


Structural  Optimization 


This  final  chapter  on  network-type  problems  deals  with  finding  the  best  design 
of  a  structure  to  support  a  specified  load  at  a  fixed  set  of  points.  The  topology  of  the 
problem  is  described  by  a  graph  where  each  node  represents  a  joint  in  the  structure 
and  each  arc  represents  a  potential  member}  We  shall  formulate  this  problem  as 
a  linear  programming  problem  whose  solution  determines  which  of  the  potential 
members  to  include  in  the  structure  and  how  thick  each  included  member  must  be 
to  handle  the  load.  The  optimization  criterion  is  to  find  a  minimal  weight  structure. 
As  we  shall  see,  the  problem  bears  a  striking  resemblance  to  the  minimum-cost 
network  flow  problem  that  we  studied  in  Chapter  14. 

1.  An  Example 

We  begin  with  an  example.  Consider  the  graph  shown  in  Figure  16.1.  This 
graph  represents  a  structure  consisting  of  five  joints  and  eight  possible  members 
connecting  the  joints.  The  five  joints  and  their  coordinates  are  given  as  follows: 


Joint 

Coordinates 

1 

(0.0 , 0.0) 

2 

(6.0 , 0.0) 

3 

(0.0 , 8.0) 

4 

(6.0 , 8.0) 

5 

(3.0 , 12.0) 

Since  joints  are  analogous  to  nodes  in  a  network,  we  shall  denote  the  set  of  joints 
by  A f  and  denote  by  m  the  number  of  joints.  Also,  since  members  are  analogous  to 
arcs  in  network  flows,  we  shall  denote  the  set  of  them  by  A.  For  the  structure  shown 
in  Figure  16.1,  the  set  of  members  is 

A={{1,2},  {1,3},  {1,4},  {2, 3},  {2, 4},  {3, 4},  {3, 5},  {4, 5}}. 

Note  that  we  enclosed  the  pairs  of  endjoints  in  braces  to  emphasize  that  their  order 
is  irrelevant.  For  example,  {2,3}  and  {3,2}  refer  to  one  and  the  same  member 
spanning  between  joints  2  and  3.  In  network  flows,  the  graphs  we  considered  were 
directed  graphs.  Here,  they  are  undirected.  Also,  the  graphs  here  are  embedded 
in  a  d-dimensional  Euclidean  space  (meaning  that  every  node  comes  with  a  set  of 
coordinates  indicating  its  location  in  d-dimensional  space).  No  such  embedding  was 


1  Civil  engineers  refer  to  beams  as  members. 
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Figure  16.1.  Sample  topology  for  a  two-dimensional  structure. 


imposed  before  in  our  study  of  network  flows,  even  though  real-world  network  flow 
problems  often  possess  such  an  embedding. 

Following  the  standard  convention  of  using  braces  to  denote  sets,  we  ought 
to  let  X{ijy  denote  the  force  exerted  by  member  {i,j}  on  its  endjoints.  But  the 
braces  are  cumbersome.  Hence,  we  shall  write  this  force  simply  as  Xij,  with  the 
understanding  that  Xij  and  Xji  denote  one  and  the  same  variable. 

We  shall  assume  that  a  positive  force  represents  tension  in  the  member  (i.e.,  the 
member  is  pulling  “in”  on  its  two  endjoints)  and  that  a  negative  value  represents 
compression  (i.e.,  the  member  is  pushing  “out”  on  its  two  endjoints). 

If  the  structure  is  to  be  in  equilibrium  (i.e.,  not  accelerating  in  some  direction), 
then  forces  must  be  balanced  at  each  joint.  Of  course,  we  assume  that  there  may 
be  a  nonzero  external  load  at  each  joint  (this  is  the  analogue  of  the  external  sup¬ 
ply/demand  in  the  minimum-cost  network  flow  problem).  Hence,  for  each  node  i, 
let  bi  denote  the  externally  applied  load.  Note  that  each  b{  is  a  vector  whose  dimen¬ 
sion  equals  the  dimension  of  the  space  in  which  the  structure  lies.  For  our  example, 
this  dimension  is  2.  In  general,  we  shall  denote  the  spatial  dimension  by  d. 

Force  balance  imposes  a  number  of  constraints  on  the  member  forces.  For 
example,  the  force  balance  equations  for  joint  2  can  be  written  as  follows: 


'  -1  " 

'  -0.6  " 

"  0  " 

b\ 

Xl2 

0 

+  ^23 

0.8 

+  3^24 

1 

— 

J 
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where  b\  and  b\  denote  the  components  of  62.  Note  that  the  three  vectors  appearing 
on  the  left  are  unit  vectors  pointing  out  from  joint  2  along  each  of  the  corresponding 
members. 


2.  Incidence  Matrices 


If,  for  each  joint  i,  we  let  pi  denote  its  position  vector,  then  the  unit  vectors 
pointing  along  the  arcs  can  be  written  as  follows: 


Pj  ~  Pi 
1  Pj  -  Pi 


{i,j}  €  A. 


It  is  important  to  note  that  up  =  —up,  since  the  first  vector  points  from  j  towards 
i,  whereas  the  second  points  from  i  towards  j.  In  terms  of  these  notations,  the  force 
balance  equations  can  be  expressed  succinctly  as 


(16.1) 


UijXij  =  —bi  i  =  1,2 

j- 


These  equations  can  be  written  in  matrix  form  as 


(16.2) 


Ax 


h 


where  x  denotes  the  vector  consisting  of  the  member  forces,  b  denotes  the  vector 
whose  elements  are  the  applied  load  vectors,  and  A  is  a  matrix  containing  the  unit 
vectors  pointing  along  the  appropriate  arcs.  For  our  example,  we  have 


X 


T 


X12  X13  X\4  X23 


X24 


^34  £35 


£45 


2 

A  =  3 
4 


1 

0 

.6 

0 

1 

.8 

'-T 

'-.6' 

'o' 

0 

.8 

1 

o' 

.6 

-1 

-.8 

Y 

'.6' 

0 

.8 

l 

_ 1 

O' 

—  1 

-.8 

-1 

0 

_ 1 

-.6 

-.8 


-.6 

.8 

.6 

-.8 


-■ 

|Al 

b2i 

b\ 

b% 

,  b  = 

b\ 

b\ 

b\ 

bl 

b\ 

U|J 

Note  that  we  have  written  iasa  matrix  of  2 -vectors  by  putting  “inner”  brack¬ 
ets  around  appropriate  pairs  of  entries.  These  inner  brackets  could  of  course  be 
dropped — they  are  included  simply  to  show  how  the  constraints  match  up  with 
(16.1). 

In  network  flows,  an  incidence  matrix  is  characterized  by  the  property  that  every 
column  of  the  matrix  has  exactly  two  nonzero  entries,  one  +1  and  one  —  1.  Here,  the 
matrix  A  is  characterized  by  the  property  that,  when  viewed  as  a  matrix  of  d-  vectors, 
every  column  has  two  nonzero  entries  that  are  unit  vectors  pointing  in  opposite 
directions  from  each  other.  Generically,  matrix  A  can  be  written  as  follows: 
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where,  for  each  {i,j}  G  A, 


and 


4 


= 


J 


uij 


Uji 


U{j  ,  Uji  G  M.  , 


’  “'J 


J' 


J* 


1. 


By  analogy  with  network  flows,  the  matrix  A  is  called  an  incidence  matrix.  This 
definition  is  a  strict  generalization  of  the  definition  we  had  before,  since,  for  d  = 
1,  the  current  notion  reduces  to  the  network  flows  notion.  Incidence  matrices  for 
network  flows  enjoy  many  useful  properties.  In  the  following  sections,  we  shall 
investigate  the  extent  to  which  these  properties  carry  over  to  our  generalized  notion. 


3.  Stability 

Recall  that  for  network  flows,  the  sum  of  the  rows  of  the  incidence  matrix  van¬ 
ishes  and  that  if  the  network  is  connected,  this  is  the  only  redundancy.  For  d  >  1, 
the  situation  is  similar.  Clearly,  the  sum  of  the  rows  vanishes.  But  is  this  the  only 
redundancy?  To  answer  this  question,  we  need  to  look  for  nonzero  row  vectors  yT 
for  which  yT  A  =  0.  The  set  of  all  such  row  vectors  is  a  subspace  of  the  set  of  all 
row  vectors.  Our  aim  is  to  find  a  basis  for  this  subspace  and,  in  particular,  to  identify 
its  dimension.  To  this  end,  first  write  y  in  component  form  as  yT  =  [  yf  •  •  •  y^ 
where  each  of  the  entries  y^  i  =  1,  2, . . . ,  m,  are  d-vectors  (transposed  to  make 
them  into  row  vectors).  Multiplying  this  row  vector  against  each  column  of  A,  we 
see  that  yT  A  =  0  if  and  only  if 

(16.3)  Vi  +  yjuji  =  0,  for  all  {i,  j}  e  A. 

There  are  many  choices  of  y  that  yield  a  zero  row  combination.  For  example,  we 
can  take  any  vector  v  G  Rd  and  put 

yi  =  v,  for  every  i  G  Af. 

Substituting  this  choice  of  ^’s  into  the  left-hand  side  of  (16.3),  we  get 

yjuij  +  yjuji  =  vTUij  +  vTUji  =  vTUij  -  vTUij  =  0. 

This  set  of  choices  shows  that  the  subspace  is  at  least  d-dimensional. 

But  there  are  more!  They  are  defined  in  terms  of  skew  symmetric  matrices.  A 
matrix  R  is  called  skew  symmetric  if  RT  =  —  R.  A  simple  but  important  property 
of  skew  symmetric  matrices  is  that,  for  every  vector  £, 

=  o 


(16.4) 
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(see  Exercise  16.1).  We  shall  make  use  of  this  property  shortly.  Now,  to  give  more 
choices  of  y,  let  R  be  a  d  x  d  skew  symmetric  matrix  and  put 

yi  =  Rpi ,  for  every  i  E  J\f 


(recall  that  pi  denotes  the  position  vector  for  joint  i).  We  need  to  check  (16.3). 
Substituting  this  definition  of  the  y’s  into  the  left-hand  side  in  (16.3),  we  see  that 

yj  Uij  +  yj  Uji  =  pj  RT  Uij  +  pj  RT  Uji 

rp  rp 

=  —pi  Ruij  —  pj  Ruji 

=  ( Pj  ~Pi)TRuij. 


Now  substituting  in  the  definition  of  u^,  we  get 


C Pj  -  Pi)TR(Pj  -  Pi) 


(. Pj  -Pi)TRuij  = 


Wpj  ~Pi\\ 

Finally,  by  putting  £  =  pj  —  pi  and  using  property  (16.4)  of  skew  symmetric  matri¬ 
ces,  we  see  that  the  numerator  on  the  right  vanishes.  Hence,  (16.3)  holds. 

How  many  redundancies  have  we  found?  For  d  =  2,  there  are  two  independent 
v-typQ  redundancies  and  one  more  i?-type.  The  following  two  vectors  and  a  matrix 
can  be  taken  as  a  basis  for  these  redundancies 


T— I 

1 _ 

o 

1 _ 

1 

o 

1 

T— 1 

i 

O 

5 

1 

5 

1 

i 

O 

For  d  =  3,  there  are  three  independent  v-type  redundancies  and  three  i?-type.  Here 
are  three  vectors  and  three  matrices  that  can  be  taken  as  a  basis  for  the  space  of 
redundancies: 


"  1  " 

o 

1 _ 

1 

o 

1 _ 

0 

5 

1 

5 

0 

1 

o 

o 

1 

1 

"0 

-1 

0  " 

"0 

0 

-1 ' 

"0 

0 

0  " 

(16.5) 

1 

0 

0 

1 

0 

0 

0 

5 

0 

0 

-1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

In  general,  there  are  d  +  d{d  —  l)/2  =  d(d  +  l)/2  independent  redundancies. 
There  could  be  more.  But  just  as  for  network  flows,  where  we  showed  that  there 
is  one  redundancy  if  and  only  if  the  network  is  connected,  further  redundancies 
represent  a  defect  in  the  underlying  graph  structure.  In  fact,  we  say  that  the  graph  is 
stable  if  the  rank  of  the  incidence  matrix  A  is  exactly  md  —  d(d  +  1) / 2,  that  is,  if 
and  only  if  the  above  redundancies  account  for  the  entire  rank  deficiency  of  A. 


4.  Conservation  Laws 

Recall  that  for  network  flows,  not  all  choices  of  supplies/demands  yield  feasible 
flows.  For  connected  networks,  it  is  necessary  and  sufficient  that  the  total  supply 
equals  the  total  demand.  The  situation  is  similar  here.  The  analogous  question  is: 
which  external  loads  give  rise  to  solutions  to  (16.2)?  We  have  already  identified 
several  row  vectors  yT  for  which  yT A  =  0.  Clearly,  in  order  to  have  a  solution 
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to  (16.2),  it  is  necessary  that  yTb  =  0  for  all  these  row  vectors.  In  particular  for 
every  v  G  Rd,  we  see  that  b  must  satisfy  the  following  condition: 

vTbi  =  0. 
i 

Bringing  the  sum  inside  of  the  product,  we  get 


Since  this  must  hold  for  every  d-  vector  v,  it  follows  that 

y>=o. 

i 

This  condition  has  a  simple  physical  interpretation:  the  loads,  taken  in  total,  must 
balance. 

What  about  the  choices  of  yT  arising  from  skew  symmetric  matrices?  We  shall 
show  that  these  choices  impose  the  conditions  necessary  to  prevent  the  structure 
from  spinning  around  some  axis  of  rotation.  To  show  that  this  is  so,  let  us  first 
consider  the  two-dimensional  case.  For  every  2x2  skew  symmetric  matrix  R,  the 
load  vectors  bi,i  E  N,  must  satisfy 

(16.6)  y ~2(Rpi)Tbi  =  0. 

i 

This  expression  is  a  sum  of  terms  of  the  form  ( Rp)Tb ,  where  p  is  the  position  vector 
of  a  point  and  b  is  a  force  applied  at  this  point.  We  claim  that  ( Rp)Tb  is  precisely 
the  torque  about  the  origin  created  by  applying  force  b  at  location  p.  To  see  this 
connection  between  the  algebraic  expression  and  its  physical  interpretation,  first 
decompose  p  into  the  product  of  its  length  r  times  a  unit  vector  v  pointing  in  the 
same  direction  and  rewrite  the  algebraic  expression  as 

(. Rp)Tb  =  r(Rv)Tb. 


Now,  without  loss  of  generality,  we  may  assume  that  R  is  the  “basis”  matrix  for  the 
space  of  skew  symmetric  matrices, 

^0  -1 
1  0 


R  = 


This  matrix  has  the  additional  property  that  its  two  columns  are  unit  vectors  that  are 
orthogonal  to  each  other.  That  is,  RT R  =  I.  Hence, 


Rv 


v 


=  1. 


Furthermore,  property  (16.4)  tells  us  that  Rv  is  orthogonal  to  v.  Therefore,  the 
product  ( Rv)Tb  is  the  length  of  the  projection  of  b  in  the  direction  of  Rv,  and  so 
r(Rv)Tb  is  the  distance  from  the  origin  (of  the  coordinate  system)  to  p,  which  is 
called  the  moment  arm ,  times  the  component  of  the  force  that  is  orthogonal  to  the 
moment  arm  in  the  direction  of  Rv  (see  Figure  16.2).  This  interpretation  for  each 
summand  in  (16.6)  shows  that  it  is  exactly  the  torque  around  the  rotation  axis  pass¬ 
ing  through  the  origin  of  the  coordinate  system  caused  by  the  force  bi  applied  to 
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Figure  16.2.  The  ith  summand  in  (16.6)  is  the  length  of  pi  times 
the  length  of  the  projection  of  bi  onto  the  direction  given  by  Rv{. 
This  is  precisely  the  torque  around  an  axis  at  0  caused  by  the  force 
bi  applied  at  joint  i. 


joint  i.  In  d  =  2,  there  is  only  one  rotation  around  the  origin.  This  fact  corresponds 
to  the  fact  that  the  dimension  of  the  space  of  skew  symmetric  matrices  in  two  dimen¬ 
sions  is  1.  Also,  stipulating  that  the  total  torque  about  the  origin  vanishes  implies 
that  the  total  torque  around  any  point  other  point  also  vanishes — see  Exercise  16.4. 

The  situation  for  d  >  2,  in  particular  for  d  =  3,  is  slightly  more  complicated. 
Algebraically,  the  complications  arise  because  the  basic  skew  symmetric  matrices 
no  longer  satisfy  RT R  =  /.  Physically,  the  complications  stem  from  the  fact  that 
in  two  dimensions  rotation  takes  place  around  a  point,  whereas  in  three  dimensions 
it  takes  place  around  an  axis.  We  shall  explain  how  to  resolve  the  complications  for 
d  =  3.  The  extension  to  higher  dimensions  is  straightforward  (and  perhaps  not  so 
important).  The  basic  conclusion  that  we  wish  to  derive  is  the  same,  namely  that 
for  basic  skew  symmetric  matrices,  the  expression  ( Rp)Tb  represents  the  torque 
generated  by  applying  a  force  b  at  point  p.  Recall  that  there  are  just  three  basic  skew 
symmetric  matrices,  and  they  are  given  by  (16.5).  To  be  specific,  let  us  just  study 
the  first  one: 


0  -1 
1  0 

0  0 


0 

0 

0 


This  matrix  can  be  decomposed  into  the  product  of  two  matrices: 


R  =  UP 


"0 

-1 

0" 

"1 

0 

0" 

u  = 

1 

0 

0 

and 

P  = 

0 

1 

0 

0 

0 

1 

0 

0 

0 

where 
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x2 


Figure  16.3.  The  decomposition  of  ( Rp)Tb  into  the  product  of 
a  moment  arm  r  times  the  component  of  b  in  the  direction  Uv 
shows  that  it  is  precisely  the  torque  around  the  third  axis. 


The  matrix  U  has  the  property  that  R  had  before,  namely, 

UTU  =  I. 

Such  matrices  are  called  unitary.  The  matrix  P  is  a  projection  matrix.  If  we  let 

q  =  Pp, 


and 


v  = 


Q 

Q 


then  we  can  rewrite  ( Rp)Tb  as 


r  = 


5 


( Rp)Tb  =  r{Uv)Tb. 


Since  v  is  a  unit  vector  and  U  is  unitary,  it  follows  that  Uv  is  a  unit  vector.  Hence, 
( Uv)Tb  represents  the  scalar  projection  of  b  onto  the  direction  determined  by  Uv. 
Also,  it  is  easy  to  check  that  Uv  is  orthogonal  to  v.  At  this  point,  we  can  consult 
Figure  16.3  to  see  that  r  is  the  moment  arm  for  the  torque  around  the  third  coordinate 
axis  and  (Uv)Tb  is  the  component  of  force  in  the  direction  of  rotation  around  this 
axis.  Therefore,  the  product  is  precisely  the  torque  around  this  axis.  As  we  know, 
for  d  —  3,  there  are  three  independent  axes  of  rotation,  namely,  pitch,  roll ,  and 
yaw.  These  axes  correspond  to  the  three  basis  matrices  for  the  space  of  3  x  3  skew 
symmetric  matrices  (the  one  we  have  just  studied  corresponds  to  the  yaw  axis). 

Finally,  we  note  that  (16.6)  simply  states  that  the  total  torque  around  each  axis 
of  rotation  must  vanish.  This  means  that  the  forces  cannot  be  chosen  to  make  the 
system  spin. 
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For  a  structure  with  m  nodes,  the  system  of  force  balance  equations  (16.2)  has 
md  equations.  But,  as  we  now  know,  if  the  structure  is  stable,  there  are  exactly 
d(d  +  l)/2  redundant  equations.  That  is,  the  rank  of  A  is  md  —  d(d  +  l)/2.  Clearly, 
the  structure  must  contain  at  least  this  many  members.  We  say  that  the  structure 
is  a  truss  if  it  is  stable  and  has  exactly  md  —  d(d  +  l)/2  members.  In  this  case, 
the  force  balance  equations  have  a  unique  solution  (assuming,  of  course,  that  the 
total  applied  force  and  the  total  applied  torque  around  each  axis  vanish).  From  an 
optimization  point  of  view,  trusses  are  not  interesting  because  they  leave  nothing  to 
optimize — one  only  needs  to  calculate. 

To  obtain  an  interesting  optimization  problem,  we  assume  that  the  proposed 
structure  has  more  members  than  the  minimum  required  to  form  a  truss.  In  this  set¬ 
ting,  we  introduce  an  optimization  criterion  to  pick  that  solution  (whether  a  truss  or 
otherwise)  that  minimizes  the  criterion.  For  us,  we  shall  attempt  to  minimize  total 
weight.  To  keep  things  simple,  we  assume  that  the  weight  of  a  member  is  directly 
proportional  to  its  volume  and  that  the  constant  of  proportionality  (the  density  of  the 
material)  is  the  same  for  each  member.  (These  assumptions  are  purely  for  notational 
convenience — a  real  engineer  would  certainly  include  these  constants  and  let  them 
vary  from  one  member  to  the  next).  Hence,  it  suffices  to  minimize  the  total  vol¬ 
ume.  The  volume  of  one  member,  say,  {i,  j},  is  its  length  =  \\pj  —  Pi\\  times  its 
cross-sectional  area.  Again,  to  keep  things  as  simple  as  possible,  we  assume  that  the 
cross-sectional  area  must  be  proportional  to  the  tension/compression  carried  by  the 
member  (members  carrying  big  loads  must  be  “fat” — otherwise  they  might  break). 
Let’s  set  the  constant  of  proportionality  arbitrarily  to  one.  Then  the  function  that  we 


should  minimize  is  just  the  sum  over  all  members  of  lij 
tion  problem  can  be  written  as  follows: 

E 


xij 


Hence,  our  optimiza- 


mimmize 


hj 


Xij 


subject  to 


Uij  Xij 


=  -b* 


i  =  1,  2, . . . ,  m. 


3- 


This  problem  is  not  a  linear  programming  problem:  the  constraints  are  linear,  but  the 
objective  function  involves  the  absolute  value  of  each  variable.  We  can,  however, 
convert  this  problem  to  a  linear  programming  problem  with  the  following  trick.  For 
each  {i,j}  G  A,  write  x^-  as  the  difference  between  two  nonnegative  variables: 

rp  .  .  -  rp  I  _  rp  rp  I  rp  H 

.ijjj  ^iji  ^ ij  — 


Think  of  x  fj  as  the  tension  part  of  Xi3  and  x-  as  the  compression  part.  The  absolute 
value  can  then  be  modeled  as  the  sum  of  these  components 


Xij 


-  rp  I  I  rp 

ij  '  ij 


We  allow  both  components  to  be  positive  at  the  same  time,  but  no  minimum- weight 
solution  will  have  any  member  with  both  components  positive,  since  if  there  were 
such  a  member,  the  tension  component  and  the  compression  component  could  be 
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decreased  simultaneously  at  the  same  rate  without  changing  the  force  balance  equa¬ 
tions  but  reducing  the  weight.  This  reduction  contradicts  the  minimum- weight  as¬ 
sumption. 

We  can  now  state  the  linear  programming  formulation  of  the  minimum  weight 
structural  design  problem  as  follows: 

minimize  ^  (UAj  +  lHxij) 

subject  to  ^  iuijxtj  -  uijx7j )  =  -hi  i  =  1,2, ... ,  ra 

3- 

{i,j}eA 

4-,  x7j  >  0  {hj}  £  A. 

In  terms  of  the  incidence  matrix,  each  column  must  now  be  written  down  twice, 
once  as  before  and  once  as  the  negative  of  before. 

6.  Anchors  Away 

So  far  we  have  considered  structures  that  are  free  floating  in  the  sense  that 
even  though  loads  are  applied  at  various  joints,  we  have  not  assumed  that  any  of 
the  joints  are  anchored  to  a  large  object  such  as  the  Earth.  This  setup  is  fine  for 
structures  intended  for  a  rocket  or  a  space  station,  but  for  Earth-bound  applications 
it  is  generally  desired  to  anchor  some  joints.  It  is  trivial  to  modify  the  formulation 
we  have  already  given  to  cover  the  situation  where  some  of  the  joints  are  anchored. 
Indeed,  the  d  force  balance  equations  associated  with  an  anchored  joint  are  simply 
dropped  as  constraints,  since  the  Earth  supplies  whatever  counterbalancing  force  is 
needed.  Of  course,  one  can  consider  dropping  only  some  of  the  d  force  balance 
equations  associated  with  a  particular  joint.  In  this  case,  the  physical  interpretation 
is  quite  simple.  For  example,  in  two  dimensions  it  simply  means  that  the  joint  is 
allowed  to  roll  on  a  track  that  is  aligned  with  one  of  the  coordinate  directions  but  is 
not  allowed  to  move  off  the  track. 

If  enough  “independent”  constraints  are  dropped  (at  least  three  in  two  dimen¬ 
sions  and  at  least  six  in  three  dimensions),  then  there  are  no  longer  any  limitations  on 
the  applied  loads — the  structure  will  be  sufficiently  well  anchored  so  that  the  Earth 
will  apply  whatever  forces  are  needed  to  prevent  the  structure  from  moving.  This  is 
the  most  typical  scenario  under  which  these  problems  are  solved.  It  makes  setting 
up  the  problem  much  easier,  since  one  no  longer  needs  to  worry  about  supplying 
loads  that  can’t  be  balanced. 

We  end  this  chapter  with  one  realistic  example.  Suppose  the  need  exists  to 
design  a  bracket  to  support  a  hanging  load  at  a  fixed  distance  from  a  wall.  This 
bracket  will  be  molded  out  of  plastic,  which  means  that  the  problem  of  finding  an 
optimal  design  belongs  to  the  realm  of  continuum  mechanics.  However,  we  can  get 
an  idea  of  the  optimal  shape  by  modeling  the  problem  discretely  (don’t  tell  anyone). 
That  is,  we  define  a  lattice  of  joints  as  shown  in  Figure  16.4  and  introduce  a  set 
of  members  from  which  the  bracket  can  be  constructed.  Each  joint  has  members 
connecting  it  to  several  nearby  joints.  Figure  16.5  shows  the  members  connected  to 
one  specific  joint.  Each  joint  in  the  structure  has  this  connection  “topology”  with, 
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Figure  16.4.  The  set  of  joints  used  for  the  discrete  approxima¬ 
tion  to  the  bracket  design  problem.  The  highlighted  joints  on  the 
left  are  anchored  to  the  wall,  and  the  highlighted  joint  on  the  right 
must  support  the  hanging  load. 


Figure  16.5.  The  members  connected  to  a  single  interior  joint. 


of  course,  the  understanding  that  joints  close  to  the  boundary  do  not  have  any  mem¬ 
ber  for  which  the  intended  connecting  joint  does  not  exist.  The  highlighted  joints 
on  the  left  side  in  Figure  16.4  are  the  anchored  joints,  and  the  highlighted  joint  on 
the  right  side  is  the  joint  to  which  the  hanging  load  is  applied  (by  “hanging,”  we 
mean  that  the  applied  load  points  downward).  The  optimal  solution  is  shown  in  Fig¬ 
ure  16.6.  The  thickness  of  each  member  is  drawn  in  proportion  to  the  square  root  of 
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Figure  16.6.  The  minimum  weight  bracket. 


the  tension/compression  in  the  member  (since  if  the  structure  actually  exists  in  three 
dimensions,  the  diameter  of  a  member  would  be  proportional  to  the  square  root  of 
the  cross-sectional  area).  Also,  those  members  under  compression  are  drawn  in  dark 
gray,  whereas  those  under  tension  are  drawn  in  light  gray.  Note  that  the  compres¬ 
sion  members  appear  to  cross  the  tension  members  at  right  angles.  These  curves  are 
called  principle  stresses.  It  is  a  fundamental  result  in  continuum  mechanics  that  the 
principle  tension  stresses  cross  the  principle  compression  stresses  at  right  angles. 
We  have  discovered  this  result  using  optimization. 

Most  nonexperts  find  the  solution  to  this  problem  to  be  quite  surprising,  since 
it  covers  such  a  large  area.  Yet  it  is  indeed  optimal.  Also,  one  can  see  that  the 
continuum  solution  should  be  roughly  in  the  shape  of  a  leaf. 

Exercises 

16.1  Show  that  a  matrix  R  is  skew  symmetric  if  and  only  if 

rp 

£  Rt,  =  0,  for  every  vector  £. 

16.2  Which  of  the  structures  shown  in  Figure  16.7  is  stable?  (Note:  each  struc¬ 
ture  is  shown  embedded  in  a  convenient  coordinate  system.) 

16.3  Which  of  the  structures  shown  in  Figure  16.7  is  a  truss? 
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Figure  16.7.  Structures  for  Exercises  16.2  and  16.3. 

16.4  Assuming  that  the  total  applied  force  vanishes,  show  that  total  torque  is 
translation  invariant.  That  is,  for  any  vector  £  G  Rd, 

-  o)Ti>i  =  'Y^{Rpi)Tbi. 

i  i 

16.5  In  3 -dimensions  there  are  5  regular  (Platonic)  solids.  They  are  shown  in 
Figure  16.8  and  have  the  following  number  of  vertices  and  edges: 


vertices 

edges 

tetrahedron 

4 

6 

cube 

8 

12 

octahedron 

6 

12 

dodecahedron 

20 

30 

icosahedron 

12 

30 
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Figure  16.8.  The  five  regular  solids. 

If  one  were  to  construct  pin-jointed  wire-frame  models  of  these  solids, 
which  ones  would  be  stable? 


Notes 

Structural  optimization  has  its  roots  in  Michell  (1904).  The  first  paper  in  which 
truss  design  was  formulated  as  a  linear  programming  problem  is  Dorn  et  al.  (1964). 
A  few  general  references  on  the  subject  include  Hemp  (1973),  Rozvany  (1989), 
Bendspe  et  al.  (1994),  and  Recski  (1989). 


Part  3 

Interior-Point  Methods 


There  is,  I  believe,  in  every  disposition  a 
tendency  to  some  particular  evil — a  natural 
defect,  which  not  even  the  best  education 

can  overcome. — J.  Austen 


CHAPTER  17 


The  Central  Path 


In  this  chapter,  we  begin  our  study  of  an  alternative  to  the  simplex  method  for 
solving  linear  programming  problems.  The  algorithm  we  are  going  to  introduce  is 
called  a  path-following  method.  It  belongs  to  a  class  of  methods  called  interior-point 
methods.  The  path-following  method  seems  to  be  the  simplest  and  most  natural  of 
all  the  methods  in  this  class,  so  in  this  book  we  focus  primarily  on  it.  Before  we 
can  introduce  this  method,  we  must  define  the  path  that  appears  in  the  name  of  the 
method.  This  path  is  called  the  central  path  and  is  the  subject  of  this  chapter.  Before 
discussing  the  central  path,  we  must  lay  some  groundwork  by  analyzing  a  nonlin¬ 
ear  problem,  called  the  harrier  problem ,  associated  with  the  linear  programming 
problem  that  we  wish  to  solve. 


Warning:  Nonstandard  Notation  Ahead 

Starting  with  this  chapter,  given  a  lower-case  letter  denoting  a  vector  quantity, 
we  shall  use  the  upper-case  form  of  the  same  letter  to  denote  the  diagonal  matrix 
whose  diagonal  entries  are  those  of  the  corresponding  vector.  For  example, 


X\ 

X2 

=> 

X  = 

X\ 

X2 

An- 

%n_ 

This  notation  is  nonstandard  in  mathematics  at  large,  but  has  achieved  a  certain 
amount  of  acceptance  in  the  interior-point-methods  community. 


1.  The  Barrier  Problem 


In  this  chapter,  we  consider  the  linear  programming  problem  expressed,  as 
usual,  with  inequality  constraints  and  nonnegative  variables: 

maximize  cTx 
subject  to  Ax  <  b 

x  >  0. 


The  corresponding  dual  problem  is 


minimize 
subject  to 


bT  y 

ATy  >  c 

y  >  o. 
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As  usual,  we  add  slack  variables  to  convert  both  problems  to  equality  form: 


(17.1) 


maximize 
subject  to 


T 

C  X 


Ax  +  w  =  b 
x,w  >  0 


and 


minimize 
subject  to 


bTy 

rp 

A  y  —  z  =  c 

y,z>  o. 


Given  a  constrained  maximization  problem  where  some  of  the  constraints  are 
inequalities  (such  as  our  primal  linear  programming  problem),  one  can  consider 
replacing  any  inequality  constraint  with  an  extra  term  in  the  objective  function.  For 
example,  in  (17.1)  we  could  remove  the  constraint  that  a  specific  variable,  say,  xj, 
is  nonnegative  by  adding  to  the  objective  function  a  term  that  is  negative  infinity 
when  Xj  is  negative  and  is  zero  otherwise.  This  reformulation  doesn’t  seem  to  be 
particularly  helpful,  since  this  new  objective  function  has  an  abrupt  discontinuity 
that,  for  example,  prevents  us  from  using  calculus  to  study  it.  However,  suppose  we 
replace  this  discontinuous  function  with  another  function  that  is  negative  infinity 
when  Xj  is  negative  but  is  finite  for  Xj  positive  and  approaches  negative  infinity  as 
Xj  approaches  zero.  In  some  sense  this  smooths  out  the  discontinuity  and  perhaps 
improves  our  ability  to  apply  calculus  to  its  study.  The  simplest  such  function  is 
the  logarithm.  Hence,  for  each  variable,  we  introduce  a  new  term  in  the  objective 
function  that  is  just  a  constant  times  the  logarithm  of  the  variable: 


(17.2) 


maximize  cT x  +  p  log  Xj  +  p  JV  log  Wi 

subject  to  Ax  +  w  =  b. 


This  problem,  while  not  equivalent  to  our  original  problem,  seems  not  too  different 
either.  In  fact,  as  the  parameter  p,  which  we  assume  to  be  positive,  gets  small,  it 
appears  that  (17.2)  becomes  a  better  and  better  stand-in  for  (17.1).  Problem  (17.2) 
is  called  the  barrier  problem  associated  with  (17.1).  Note  that  it  is  not  really  one 
problem,  but  rather  a  whole  family  of  problems  indexed  by  the  parameter  p.  Each  of 
these  problems  is  a  nonlinear  programming  problem  because  the  objective  function 
is  nonlinear.  This  nonlinear  objective  function  is  called  a  barrier  function  or,  more 
specifically,  a  logarithmic  barrier  function. 

It  is  instructive  to  have  in  mind  a  geometric  picture  of  the  barrier  function.  Re¬ 
call  that,  for  problems  expressed  in  standard  form,  the  set  of  feasible  solutions  is 
a  polyhedron  with  each  face  being  characterized  by  the  property  that  one  of  the 
variables  is  zero.  Hence,  the  barrier  function  is  minus  infinity  on  each  face  of  the 
polyhedron.  Furthermore,  it  is  finite  in  the  interior  of  the  polyhedron,  and  it  ap¬ 
proaches  minus  infinity  as  the  boundary  is  approached.  Figure  17.1  shows  some 
level  sets  for  the  barrier  function  for  a  specific  problem  and  a  few  different  choices 
of  p.  Notice  that,  for  each  p,  the  maximum  is  attained  at  an  interior  point,  and  as 
p  gets  closer  to  zero  this  interior  point  moves  closer  to  the  optimal  solution  of  the 
original  linear  programming  problem  (which  is  at  the  top  vertex).  Viewed  as  a  func¬ 
tion  of  p,  the  set  of  optimal  solutions  to  the  barrier  problems  forms  a  path  through 
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Figure  17.1.  Parts  (a)  through  (c)  show  level  sets  of  the  barrier 
function  for  three  values  of  /x.  For  each  value  of  /x,  four  level  sets 
are  shown.  The  maximum  value  of  the  barrier  function  is  attained 
inside  the  innermost  level  set.  The  drawing  in  part  (d)  shows  the 
central  path. 

the  interior  of  the  polyhedron  of  feasible  solutions.  This  path  is  called  the  central 
path .  Our  aim  is  to  study  this  central  path.  To  this  end,  we  need  to  develop  some 
machinery,  referred  to  as  Lagrange  multipliers. 

2.  Lagrange  Multipliers 

We  wish  to  discuss  briefly  the  general  problem  of  maximizing  a  function  sub¬ 
ject  to  one  or  more  equality  constraints.  Here,  the  functions  are  permitted  to  be 
nonlinear,  but  are  assumed  to  be  smooth,  say,  twice  differentiable. 

For  the  moment,  suppose  that  there  is  a  single  constraint  equation  so  that  the 
problem  can  be  formally  stated  as 

maximize  f(x) 
subject  to  g{x)  =  0. 

In  this  case,  the  geometry  behind  the  problem  is  compelling  (see  Figure  17.2).  The 
gradient  of  /,  denoted  V/,  is  a  vector  that  points  in  the  direction  of  most  rapid 
increase  of  /.  For  unconstrained  optimization,  we  would  simply  set  this  vector  equal 
to  zero  to  determine  the  so-called  critical  points  of  /,  and  the  maximum,  if  it  exists, 
would  have  to  be  included  in  this  set.  However,  given  the  constraint,  g{pc )  =  0,  it  is 
no  longer  correct  to  look  at  points  where  the  gradient  vanishes.  Instead,  the  gradient 
must  be  orthogonal  to  the  set  of  feasible  solutions  {x  :  g(pc)  =  0}.  Of  course,  at 
each  point  x  in  the  feasible  set,  Vg(x),  is  a  vector  that  is  orthogonal  to  the  feasible 
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Figure  17.2.  The  concentric  rings  illustrate  a  few  level  sets  of 
/.  Clearly,  at  the  optimal  solution,  x*,  the  gradient  must  be  per¬ 
pendicular  to  the  feasible  set. 

set  at  this  point  x.  Hence,  our  new  requirement  for  a  point  x*  to  be  a  critical  point 
is  that  it  is  feasible  and  that  V/(x*)  be  proportional  to  Vg(x*).  Writing  this  out  as 
a  system  of  equations,  we  have 

g(x*)  =  o 

V/(x*)  =  yVg(x*). 

Here,  y  is  the  proportionality  constant.  Note  that  it  can  be  any  real  number,  ei¬ 
ther  positive,  negative,  or  zero.  This  proportionality  constant  is  called  a  Lagrange 
multiplier. 

Now  consider  several  constraints: 

maximize  f(x) 
subject  to  gi  (x)  =  0 

g2(x)  =  0 

9m(%)  0* 

In  this  case,  the  feasible  region  is  the  intersection  of  m  hypersurfaces  (see  Fig¬ 
ure  17.3).  The  space  orthogonal  to  the  feasible  set  at  a  point  x  is  no  longer  a  one¬ 
dimensional  set  determined  by  a  single  gradient,  but  is  instead  a  higher-dimensional 
space  (typically  m),  given  by  the  span  of  the  gradients.  Hence,  we  require  that 
Vf(x*)  lie  in  this  span.  This  yields  the  following  set  of  equations  for  a  critical 
point: 

g(x*)  =  0 

m 

(17.3)  Vf(x*)  =  ^yiVg(x*). 

i—  1 

The  derivation  of  these  equations  has  been  entirely  geometric,  but  there  is  also  a 
simple  algebraic  formalism  that  yields  the  same  equations.  The  idea  is  to  introduce 
the  so-called  Lagrangian  function 
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V  Si 


Figure  17.3.  The  feasible  set  is  the  curve  formed  by  the  inter¬ 
section  of  gi  =  0  and  #2=0.  The  point  x*  is  optimal,  since  the 
gradient  of  /  at  that  point  is  perpendicular  to  the  feasible  set. 


L(x,  y)  =  f(x)  -  ^2  Vi9i{x ) 


and  to  look  for  its  critical  points  over  both  x  and  y.  Since  this  is  now  an  uncon¬ 
strained  optimization  problem,  the  critical  points  are  determined  by  simply  setting 
all  the  first  derivatives  to  zero: 


dL 

dxj 


dL 

% 


~9i 


=  0, 
=  0, 


j  =  1,2, 
i  =  1,  2, . . . ,  m. 


Writing  these  equations  in  vector  notation,  we  see  that  they  are  exactly  the  same  as 
those  derived  using  the  geometric  approach.  These  equations  are  usually  referred  to 
as  the  first-order  optimality  conditions. 

Determining  whether  a  solution  to  the  first-order  optimality  conditions  is  indeed 
a  global  maximum  as  desired  can  be  difficult.  However,  if  the  constraints  are  all 
linear,  the  first  step  (which  is  often  sufficient)  is  to  look  at  the  matrix  of  second 
derivatives: 

'  d2/  ' 

dxidxj 

This  matrix  is  called  the  Hessian  of  /  at  x.  We  have 


THEOREM  17.1.  If  the  constraints  are  linear,  a  critical  point  x*  is  a  local  max¬ 
imum  if 


(17.4) 


enf(x*)z  <  0 
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for  each  £  7^  0  satisfying 

(17.5)  (r\7gi(x*)=  0,  i  =  1,2, ...  ,m. 

Proof.  We  start  with  the  two-term  Taylor  series  expansion  of  /  about  x *: 

fix*  +  0  =  fix*)  +  V/(^)T^  +  \eHf(x *)£  +  odiell2). 

The  vector  £  represents  a  displacement  from  the  current  point  x* .  The  only  displace¬ 
ments  that  are  relevant  are  those  that  lie  in  the  feasible  set.  Hence,  let  £  be  a  direction 
vector  satisfying  (17.5).  From  (17.3)  and  (17.5),  we  see  that  V/(x*)T£  =  0,  and  so 

fix*  +  0  =  fix*)  +  \tTHfi: x*)S  +  o(||e||2). 

Employing  (17.4)  finishes  the  proof.  □ 


It  is  worth  remarking  that  if  (17.4)  is  satisfied  not  just  at  x *  but  at  all  x,  then  x* 
is  a  unique  global  maximum. 

In  the  next  section,  we  shall  use  Lagrange  multipliers  to  study  the  central  path 
defined  by  the  barrier  problem. 


3.  Lagrange  Multipliers  Applied  to  the  Barrier  Problem 

In  this  section,  we  shall  use  the  machinery  of  Lagrange  multipliers  to  study  the 
solution  to  the  barrier  problem.  In  particular,  we  will  show  that  (subject  to  some 
mild  assumptions)  for  each  value  of  the  barrier  parameter  /x,  there  is  a  unique  solu¬ 
tion  to  the  barrier  problem.  We  will  also  show  that  as  /x  tends  to  zero,  the  solution  to 
the  barrier  problem  tends  to  the  solution  to  the  original  linear  programming  problem. 
In  the  course  of  our  study,  we  will  stumble  naturally  upon  the  central  path  for  the 
dual  problem.  Taken  together,  the  equations  defining  the  primal  and  the  dual  central 
paths  play  an  important  role,  and  so  we  will  introduce  the  notion  of  a  primal-dual 
central  path. 

We  begin  by  recalling  the  barrier  problem: 

maximize  cTx  +  /x  log  Xj  +  /x  log  Wi 
subject  to  Ax  +  w  =  b. 

This  is  an  equality-constrained  optimization  problem,  and  so  it  is  a  problem  to  which 
we  can  apply  the  Lagrange  multiplier  tools  developed  in  the  previous  section.  The 
Lagrangian  for  this  problem  is 

L(x,w,y)  =  cTx  +  /x^-^log  Xj  +  /x^^log  Wi  4 -  yT(b  —  Ax  —  w). 

j 
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Taking  derivatives  with  respect  to  each  variable  and  setting  them  to  zero,  we  get  the 
first-order  optimality  conditions: 


dL 

dxj 

1 

—  Cj  -f-  fl 

J  rp  . 

Xj 

Uiaij 

i 

=  o, 

3  =  1,2,.. 

dL 

1 

=  o, 

i  =  1,2,.. 

dwi 

—  M  Vi 

Wi 

dL 

dyi 

=  bi  —  ciij 

3 

Xj  —  Wi 

=  0, 

i  =  1,2,.. 

Writing  these  equations  in  matrix  form,  we  get 

ATy  —  fiX  ~1e  =  c 

y  =  jjW~1e 
Ax  +  w  =  b. 


Here,  as  warned  at  the  beginning  of  the  chapter,  X  denotes  the  diagonal  matrix 
whose  diagonal  entries  are  the  components  of  x,  and  similarly  for  W.  Also,  recall 
that  we  use  e  to  denote  the  vector  of  all  ones. 

Introducing  an  extra  vector  defined  as  z  =  /iX~1e ,  we  can  rewrite  the  first- 
order  optimality  conditions  like  this: 

Ax  H -  w  =  b. 

m 

Ay  —  z  =  c 

z  =  fiX~1e 
y  =  fiW~1e. 


Finally,  if  we  multiply  the  third  equation  through  by  X  and  the  fourth  equation  by 
W,  we  arrive  at  a  primal-dual  symmetric  form  for  writing  these  equations: 


(17.6) 


Ax  +  w  =  b 
ATy  —  z  =  c 
XZe  =  fie 
YWe  =  fie. 


Note  that  the  first  equation  is  the  equality  constraint  that  appears  in  the  primal  prob¬ 
lem,  while  the  second  equation  is  the  equality  constraint  for  the  dual  problem.  Fur¬ 
thermore,  writing  the  third  and  fourth  equations  out  componentwise, 


XjZj  =  fi  j  =  1,  2, . . . ,  n 

y%Wi  =  11  i  =  1,2, . . .  ,m, 

we  see  that  they  are  closely  related  to  our  old  friend:  complementarity.  In  fact,  if 
we  set  fi  to  zero,  then  they  are  exactly  the  usual  complementarity  conditions  that 
must  be  satisfied  at  optimality.  For  this  reason,  we  call  these  last  two  equations  the 
li- complementarity  conditions. 

The  first-order  optimality  conditions,  as  written  in  (17.6),  give  us  2 n  +  2m 
equations  in  2 n  +  2m  unknowns.  If  these  equations  were  linear,  they  could  be 
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solved  using  Gaussian  elimination,  and  the  entire  subject  of  linear  programming 
would  be  no  more  difficult  than  solving  systems  of  linear  equations.  But  alas,  they 
are  nonlinear — but  just  barely.  The  only  nonlinear  expressions  in  these  equations 
are  simple  multiplications  such  as  XjZj.  This  is  about  the  closest  to  being  linear 
that  one  could  imagine.  Yet,  it  is  this  nonlinearity  that  makes  the  subject  of  linear 
programming  nontrivial. 

We  must  ask  both  whether  a  solution  to  (17.6)  exists  and  if  so  is  it  unique.  We 
address  these  questions  in  reverse  order. 


4.  Second-Order  Information 


To  show  that  the  solution,  if  it  is  exists,  must  be  unique,  we  use  second-order 
information  on  the  barrier  function: 

(17.7)  f(x,  w)  =  cTx  +  (i  log  Xj  +  ii  log  wi. 

3  i 


The  first  derivatives  are 


df_ 

dxj 


—  cj  + 


R 

Xn 


df  =  jj_ 

dwi  Wi  ’ 

and  the  pure  second  derivatives  are 


dx2 


/i 


dw2 


H 

wf 


j  =  1,2, . . .  ,n, 
i  =  1,  2, . . . ,  m, 


j  —  1,  2,  .  .  .  ,  77, 
i  =  1,  2, . . . ,  m. 


All  the  mixed  second  derivatives  vanish.  Therefore,  the  Hessian  is  a  diagonal  matrix 
with  strictly  negative  entries.  Hence,  by  Theorem  17.1,  there  can  be  at  most  one 
critical  point  and,  if  it  exists,  it  is  a  global  maximum. 


5.  Existence 

So,  does  a  solution  to  the  barrier  problem  always  exist?  It  might  not.  Consider, 
for  example,  the  following  trivial  optimization  problem  on  the  nonnegative  half-line: 

maximize  0 
subject  to  x  >  0. 

For  this  problem,  the  barrier  function  is 

f(x)  =  l^logx, 

which  doesn’t  have  a  maximum  (or,  less  precisely,  the  maximum  is  infinity  which 
is  attained  at  x  =  oo).  However,  such  examples  are  rare.  For  example,  consider 
modifying  the  objective  function  in  this  example  to  make  x  =  0  the  unique  optimal 
solution: 

maximize  —  x 
subject  to  x  >  0. 
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In  this  case,  the  barrier  function  is 

f(x)  =  —X  +  p  logx, 

which  is  a  function  whose  maximum  is  attained  at  x  =  p. 

In  general,  we  have  the  following  result: 

THEOREM  17.2.  There  exists  a  solution  to  the  barrier  problem  if  and  only  if 
both  the  primal  and  the  dual  feasible  regions  have  nonempty  interior 


PROOF.  The  “only  if”  part  is  trivial  and  less  important  to  us.  Therefore,  we  only 
prove  the  “if”  part.  To  this  end,  suppose  that  both  the  primal  and  the  dual  feasible 
regions  have  nonempty  interior.  This  means  that  there  exists  a  primal  feasible  point 
(x,w)  with  x  >  0  and  w  >  0  and  there  exists  a  dual  feasible  point  (y,z)  with  y  >  0 
and  z  >  0.  Now,  given  any  primal  feasible  point  (x,  w),  consider  the  expression 
zT x  +  yTw.  Replacing  the  primal  and  dual  slack  variables  with  their  definitions, 
we  can  rewrite  this  expression  as  follows: 

zT x  +  yTw  =  (ATy  —  cjT  x  +  yT  (b  —  Ax) 

rT]  rT] 

=  b  y  —  c  x. 


Solving  this  equation  for  the  primal  objective  function  cTx,  we  get  that 

rr~]  rri  rri  rr~\ 

c  x  =  —2  x  —  y  w  +  b  y. 


Therefore,  the  barrier  function  /  defined  in  equation  (17.7)  can  be  written  as  fol¬ 
lows: 

/(#,  w)  =  CTX  +  fl  log  Xj  +  p  log  Wi 

j  i 

=  ^2(-^3x3  +^OgXj) 

j 

+  M  loS  Wi) 

+  bTy. 


Note  that  the  last  term  is  just  a  constant.  Also,  each  summand  in  the  two  sums  is  a 
function  of  just  one  variable.  These  functions  all  have  the  following  general  form: 


h(£)  =  -a£  +  n  log£,  0  <  £  <  oo, 


where  a  >  0.  Such  functions  have  a  unique  maximum  (at  p/a)  and  tend  to  —  oo  as 
£  tends  to  oo.  From  these  observations,  it  is  easy  to  see  that,  for  every  constant  c, 
the  set 


{(x,w)  G  Mn+m  :  f(x,w)  >  c} 


is  bounded. 
Put 


/  =  f(x,w) 


1 


Recall  that  we  write  £  >  0  to  mean  that  >  0  for  all  j. 
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and  let 

P  =  {(x,  w)  :  Ax  +  w  =  6,  x  >  0,  w  >  0,  /(x,  w)  >  /}. 

Clearly,  P  is  nonempty,  since  it  contains  (x,  w).  From  the  discussion  above,  we  see 
that  P  is  a  bounded  set. 

This  set  is  also  closed.  To  see  this,  note  that  it  is  the  intersection  of  three  sets, 

{(x,  w)  :  Ax  +  w  =  b}  D  {(x,  w)  :  x  >  0,  w  >  0}  D  {(x,  w)  :  /(x,  ic)  >  /}. 

The  first  two  of  these  sets  are  obviously  closed.  The  third  set  is  closed  because  it  is 
the  inverse  image  of  a  closed  set,  [/,  oo],  under  a  continuous  mapping  /.  Finally, 
the  intersection  of  three  closed  sets  is  closed. 

In  Euclidean  spaces,  a  closed  bounded  set  is  called  compact.  A  well-known 
theorem  from  real  analysis  about  compact  sets  is  that  a  continuous  function  on  a 
nonempty  compact  set  attains  its  maximum.  This  means  that  there  exists  a  point 
in  the  compact  set  at  which  the  function  hits  its  maximum.  Applying  this  theorem 
to  /  on  P,  we  see  that  /  does  indeed  attain  its  maximum  on  P,  and  this  implies  it 
attains  its  maximum  on  all  of  {(#,  w)  :  x  >  0,  w  >  0},  since  P  was  by  definition 
that  part  of  this  domain  on  which  /  takes  large  values  (bigger  than  /,  anyway).  This 
completes  the  proof.  □ 


We  summarize  our  main  result  in  the  following  corollary: 

COROLLARY  113.  If  a  primal  feasible  set  (or,  for  that  matter,  its  dual )  has  a 
nonempty  interior  and  is  bounded,  then  for  each  p  >  0  there  exists  a  unique  solution 

(*£/x>  ’  y n i  zn) 

to  (17.6). 

PROOF.  Follows  immediately  from  the  previous  theorem  and  Exercise  10.7. 

□ 


The  path  {(xM,  w^,  y^,  z^)  :  /u  >  0}  is  called  the  primal— dual  central  path. 
It  plays  a  fundamental  role  in  interior-point  methods  for  linear  programming.  In 
the  next  chapter,  we  define  the  simplest  interior-point  method.  It  is  an  iterative 
procedure  that  at  each  iteration  attempts  to  move  toward  a  point  on  the  central  path 
that  is  closer  to  optimality  than  the  current  point. 


Exercises 

17.1  Compute  and  graph  the  central  trajectory  for  the  following  problem: 

maximize  —x\  +  x 2 
subject  to  X2  <  1 

—x\  <  —1 

Xi,  X2  >  0  . 

Hint:  The  primal  and  dual  problems  are  the  same — exploit  this  symmetry. 
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17.2  Let  0  be  a  fixed  parameter,  0  <  0  <  f ,  and  consider  the  following  prob¬ 
lem: 

maximize  (cos  6)x  i  +  (sin  6)x  2 

subject  to  x\<l 

x2  <  1 
Xl,X2  >  0. 

Compute  an  explicit  formula  for  the  central  path  (x^,  z^),  and 

evaluate  lim^oo  and  limM^0  x^. 

17.3  Suppose  that  {x  :  Ax  <  b,x  >  0}  is  bounded.  Let  r  E  Mn  and  s  G  Mm  be 
vectors  with  positive  elements.  By  studying  an  appropriate  barrier  func¬ 
tion,  show  that  there  exists  a  unique  solution  to  the  following  nonlinear 
system: 

Ax  +  w  =  b 
ATy  —  z  =  c 
XZe  =  r 
YWe  =  s 
x ,  y,z,w>  0. 


17.4  Consider  the  linear  programming  problem  in  equality  form: 

maximize  ^  CjXj 
(17.8)  subject  to  E  cijXj  =  b 

j 

Xj  >0,  j  =  1,  2, . . . ,  n, 

where  each  ctj  is  a  vector  in  Mm,  as  is  b.  Consider  the  change  of  variables, 

X3  =  » 

and  the  associated  maximization  problem: 

maximize  ZjCj$ 
subject  to  aj£j  =  b 

(note  that  the  nonnegativity  constraints  are  no  longer  needed).  Let  V  de¬ 
note  the  set  of  basic  feasible  solutions  to  (17.8),  and  let  W  denote  the  set 
of  points  (£1 ,  •  •  •  >  £n)  in  f°r  which  (£1,  £2 , . . . ,  £n)  is  a  solution 

to  the  first-order  optimality  conditions  for  (17.9).  Show  that  V  C  W. 
What  does  this  say  about  the  possibility  of  using  (17.9)  as  a  vehicle  to 
solve  (17.8)? 


Notes 

Research  into  interior-point  methods  has  its  roots  in  the  work  of  Fiacco  and 
McCormick  (1968).  Interest  in  these  methods  exploded  after  the  appearance  of 
the  seminal  paper  Karmarkar  (1984).  Karmarkar’s  paper  uses  clever  ideas  from 
projective  geometry.  It  doesn’t  mention  anything  about  central  paths,  which  have 
become  fundamental  to  the  theory  of  interior-point  methods.  The  discovery  that 
Karmarkar’s  algorithm  has  connections  with  the  primal-dual  central  path  introduced 
in  this  chapter  can  be  traced  to  Megiddo  (1989).  The  notion  of  central  points  can  be 
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traced  to  pre-Karmarkar  times  with  the  work  of  Huard  (1967).  D.A.  Bayer  and  J.C. 
Lagarias,  in  a  pair  of  papers  (Bayer  and  Lagarias  1989a,b),  give  an  in-depth  study 
of  the  central  path. 

Deriving  optimality  conditions  and  giving  conditions  under  which  they  are  nec¬ 
essary  and  sufficient  to  guarantee  optimality  is  one  of  the  main  goals  of  nonlin¬ 
ear  programming.  Standard  texts  on  this  subject  include  the  books  by  Luenberger 
(1984),  Bertsekas  (1995),  and  Nash  and  Sofer  (1996). 


CHAPTER  18 


A  Path-Following  Method 


In  this  chapter,  we  define  an  interior-point  method  for  linear  programming  that 
is  called  a  path-following  method.  Recall  that  for  the  simplex  method  we  required 
a  two-phase  solution  procedure.  The  path-following  method  is  a  one-phase  method. 
This  means  that  the  method  can  begin  from  a  point  that  is  neither  primal  nor  dual 
feasible  and  it  will  proceed  from  there  directly  to  the  optimal  solution.  Hence, 
we  start  with  an  arbitrary  choice  of  strictly  positive  values  for  all  the  primal  and 
dual  variables,  i.e.,  (x,w,y,z)  >  0,  and  then  iteratively  update  these  values  as 
follows: 

(1)  Estimate  an  appropriate  value  for  y  (i.e.,  smaller  than  the  “current”  value 
but  not  too  small). 

(2)  Compute  step  directions  (Ax,  Aw,  Ay,  A z)  pointing  approximately  at 
the  point  (xM,  w^,  y^,  z^)  on  the  central  path. 

(3)  Compute  a  step  length  parameter  0  such  that  the  new  point 

x  =  x  +  OAx,  y  =  y  +  OAy, 

w  =  w  H-  OAw,  z  =  z  +  OAz 

continues  to  have  strictly  positive  components. 

(4)  Replace  (x,  w,  y,  z)  with  the  new  solution  (x,  w,  y,  z). 

To  fully  define  the  path-following  method,  it  suffices  to  make  each  of  these  four 
steps  precise.  Since  the  second  step  is  in  some  sense  the  most  fundamental,  we  start 
by  describing  that  one  after  which  we  turn  our  attention  to  the  others. 

1.  Computing  Step  Directions 

Our  aim  is  to  find  (Ax,  Aw,  Ay,  A z)  such  that  the  new  point  (x  +  Ax,  w  + 
Aw,  y  +  Ay,  z  +  A z)  lies  approximately  on  the  primal-dual  central  path  at  the  point 
(xM,  w^,  y^,z^).  Recalling  the  defining  equations  for  this  point  on  the  central  path, 

Ax  +  w  =  b 

m 

A  y  —  z  =  c 
XZe  =  ye 
YWe  =  ye, 

we  see  that  the  new  point  (x  +  Ax,  w  +  Aw,  y  +  Ay,  z  +  A z),  if  it  were  to  lie 
exactly  on  the  central  path  at  y ,  would  be  defined  by 
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A(x  +  Ax)  +  (w  +  Aw)  =  b 

AT(y  +  Ay)  -  (z  +  Az)  =  c 
(X  +  AX)(Z  +  AZ)e  =  /ie 
(7  +  A7)(^  +  A^)e  =  ye. 

Thinking  of  (x,  if,  y,  z)  as  data  and  (Ax,  Aff,  Ay,  Az)  as  unknowns,  we  rewrite 
these  equations  with  the  unknowns  on  the  left  and  the  data  on  the  right: 

AAx  +  Aw  =  b  —  Ax  —  w  =:  p 

AT Ay  —  Az  =  c  —  ATy  +  z  =:  cr 
ZAx  +  XAz  +  AXAZe  =  ye  —  XZe 
WAy  +  TAir  +  AY  AW  e  =  ye  -  YWe. 

Note  that  we  have  introduced  abbreviated  notations,  p  and  cr,  for  the  first  two  right- 
hand  sides.  These  two  vectors  represent  the  primal  infeasibility  and  the  dual  infea¬ 
sibility ,  respectively. 

Now,  just  as  before,  these  equations  form  a  system  of  nonlinear  equations  (this 
time  for  the  “delta”  variables).  We  want  to  have  a  linear  system,  so  at  this  point  we 
simply  drop  the  nonlinear  terms  to  arrive  at  the  following  linear  system: 

(18.1)  AAx  +  Aw  =  p 

(18.2)  AT  Ay  —  Az  =  a 

(18.3)  ZAx  +  XAz  =  ye  —  XZe 

(18.4)  WAy  +  Y  Aw  =  ye  —  YWe. 

This  system  of  equations  is  a  linear  system  of  2 n  +  2 m  equations  in  2 n  +  2 m 
unknowns.  We  will  show  later  that  this  system  is  nonsingular  (under  the  mild  as¬ 
sumption  that  A  has  full  rank)  and  therefore  that  it  has  a  unique  solution  that  defines 
the  step  directions  for  the  path-following  method.  Chapters  19  and  20  are  devoted 
to  studying  methods  for  efficiently  solving  systems  of  this  type. 

If  the  business  of  dropping  the  nonlinear  “delta”  terms  strikes  you  as  bold,  let 
us  remark  that  this  is  the  most  common  approach  to  solving  nonlinear  systems  of 
equations.  The  method  is  called  Newton’s  method.  It  is  described  briefly  in  the  next 
section. 


2.  Newton’s  Method 


Given  a  function 


m 


'  m) ' 

*2(0 

'  0  ' 

0 

*V(0 

,  £  = 

.  6v  _ 

from  Rn  into  M.N ,  a  common  problem  is  to  find  a  point  £*  G  for  which 
F(£*)  =  0.  Such  a  point  is  called  a  wot  of  F.  Newton’s  method  is  an  iterative 
method  for  solving  this  problem.  One  step  of  the  method  is  defined  as  follows. 
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Given  any  point  £  E  M^,  the  goal  is  to  find  a  step  direction  A£  for  which  F(£  + 
A£)  =  0.  Of  course,  for  a  nonlinear  F  it  is  not  possible  to  find  such  a  step  direction. 
Hence,  it  is  approximated  by  the  first  two  terms  of  its  Taylor’s  series  expansion, 

+ ao  ^  no + no 


where 


dF1 

d£i 

dF± 

d^2 

dF1 

= 

dF2 

d£i 

dF2 

<9^2 

dF2 

dFN 

d£i 

dFN 

<9^2 

8Fn 

The  approximation  is  linear  in  AO  Hence,  equating  it  to  zero  gives  a  linear  system 
to  solve  for  the  step  direction: 


F'(0  =  -F(£). 


Given  AO  Newton’s  method  updates  the  current  solution  £  by  replacing  it  with 
£  + AO  The  process  continues  until  the  current  solution  is  approximately  a  root  (i.e., 
m  0).  Simple  one-dimensional  examples  given  in  every  elementary  calculus 
text  illustrate  that  this  method  works  well,  when  it  works,  but  it  can  fail  if  F  is  not 
well  behaved  and  the  initial  point  is  too  far  from  a  solution. 

Let’s  return  now  to  the  problem  of  finding  a  point  on  the  central  path.  Letting 


£ 


x 

w 

y 

z 


and 

Ax H -  w  —  b 
ATy  —  z  —  c 
XZe  —  fie  ’ 

YWe  -  /le 

we  see  that  the  set  of  equations  defining  (xM,  z^)  is  a  root  of  F.  The  matrix 

of  derivatives  of  F  is  given  by 


m  = 


F'(0 


A  I  0  0 

0  0  AT  -I 

Z  0  0  X 

0  Y  W  0 


Ax 

Aw 

Ay  ’ 

Az 


Noting  that 


272 


18.  A  PATH-FOLLOWING  METHOD 


it  is  easy  to  see  that  the  Newton  direction  coincides  with  the  direction  obtained  by 
solving  equations  (1 8. 1)— (1 8.4). 


3.  Estimating  an  Appropriate  Value  for  the  Barrier  Parameter 

We  need  to  say  how  to  pick  y.  If  y  is  chosen  to  be  too  large,  then  the  sequence 
could  converge  to  the  analytic  center  of  the  feasible  set,  which  is  not  our  intention. 
If,  on  the  other  hand,  y  is  chosen  to  be  too  small,  then  the  sequence  could  stray  too 
far  from  the  central  path  and  the  algorithm  could  jam  into  the  boundary  of  the  fea¬ 
sible  set  at  a  place  that  is  suboptimal.  The  trick  is  to  find  a  reasonable  compromise 
between  these  two  extremes.  To  do  this,  we  first  figure  out  a  value  that  represents, 
in  some  sense,  the  current  value  of  y  and  we  then  choose  something  smaller  than 
that,  say  a  fixed  fraction  of  it. 

We  are  given  a  point  (x,w,y,z)  that  is  almost  certainly  off  the  central  path. 
If  it  were  on  the  central  path,  then  there  are  several  formulas  by  which  we  could 
recover  the  corresponding  value  of  y.  For  example,  we  could  just  compute  ZjXj  for 
any  fixed  index  j.  Or  we  could  compute  yiWi  for  any  fixed  i.  Or,  perverse  as  it  may 
seem,  we  could  average  all  these  values: 


(18.5) 


'T'  'T' 

z  x  +  y  w 

M  =  - - - 

n  +  rn 


This  formula  gives  us  exactly  the  value  of  y  whenever  it  is  known  that  (x,w,y,z) 
lies  on  the  central  path.  The  key  point  here  then  is  that  we  will  use  this  formula  to 
produce  an  estimate  for  y  even  when  the  current  solution  (x,  w,y,z)  does  not  lie  on 
the  central  path.  Of  course,  the  algorithm  needs  a  value  of  y  that  represents  a  point 
closer  to  optimality  than  the  current  solution.  Hence,  the  algorithm  takes  this  “par” 
value  and  reduces  it  by  a  certain  fraction: 


rp  rp 

z  x  +  y  w 

y  =  5 - - - , 

n  +  rn 

where  5  is  a  number  between  zero  and  one.  In  practice,  one  finds  that  setting  5  to 
approximately  1/10  works  quite  well,  but  for  the  sake  of  discussion  we  will  always 
leave  it  as  a  parameter. 


4.  Choosing  the  Step  Length  Parameter 

The  step  directions,  which  were  determined  using  Newton’s  method,  were  de¬ 
termined  under  the  assumption  that  the  step  length  parameter  9  would  be  equal  to 
one  (i.e.,  x  =  x  +  Ax ,  etc.).  But  taking  such  a  step  might  cause  the  new  solution 
to  violate  the  property  that  every  component  of  all  the  primal  and  the  dual  variables 
must  remain  positive.  Hence,  we  may  need  to  use  a  smaller  value  for  6.  We  need  to 
guarantee,  for  example,  that 


Xj  +  OAxj  >  0, 


j  —  1,2,..., 


n. 
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initialize  (x,w,y,z)  >  0 
while  (not  optimal)  { 

p  =  b  —  Ax  —  w 
(7  =  c  —  ATy  +  z 
7  =  zT  x  +  7Ttc 


} 


p  =  5 
solve: 


7 


n  +  rn 


^3 


9  =  r  ^ 

x  A-  x  +  0Ax, 

y  ^y  +  OAy , 


AAx  +  Aw 

=  P 

AT  Ay  —  A  z 

=  C J 

ZAx  +  XAz 

=  pe 

W  Ay  +  Y  Aw 

=  pe 

„„„  f  Axj 

A  Wi 

x 


7 


IE, 


Aj/j 

Vi 


w  <—  w  +  9 Aw 
Z  i —  z  -f-  OAz 


Figure  18.1.  The  path-following  method. 


Moving  the  Ax y  term  to  the  other  side  and  then  dividing  through  by  6  and  Xj ,  both 
of  which  are  positive,  we  see  that  0  must  satisfy 

1 


0 


> 


Axj_ 

Xj 


j  1,  2,  .  .  .  ,  Tl. 


Of  course,  a  similar  inequality  must  be  satisfied  for  the  w,  y,  and  z  variables  too. 
Putting  it  all  together,  the  largest  value  of  6  would  be  given  by 

1  f  A  xj  A  Wj  A  yi 

e 


max 

ij 


Az: 


Xj 


Wi 


Vi 


Z3 


where  we  have  abused  notation  slightly  by  using  the  max2J  to  denote  the  maximum 
of  all  the  ratios  in  the  indicated  set.  However,  this  choice  of  9  will  not  guarantee 
strict  inequality,  so  we  introduce  a  parameter  r,  which  is  a  number  close  to  but 
strictly  less  than  one,  and  we  set 

-l 


(18.6) 


9  —  r 


max 

ij 


A  Xj  A  Wi  A  yi 


Xj 


Wi 


Vi 


A  1. 


This  formula  may  look  messy,  and  no  one  should  actually  do  it  by  hand,  but  it  is 
trivial  to  program  a  computer  to  do  it.  Such  a  subroutine  will  be  really  fast  (requiring 
only  on  the  order  of  2 n  +  2 m  operations). 

A  summary  of  the  algorithm  is  shown  in  Figure  18.1.  In  the  next  section,  we 
investigate  whether  this  algorithm  actually  converges  to  an  optimal  solution. 


l 


For  compactness,  we  use  the  notation  a  A  b  to  represent  the  minimum  of  the  two  numbers  a  and  b. 
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5.  Convergence  Analysis 


In  this  section,  we  investigate  the  convergence  properties  of  the  path-following 
algorithm.  Recall  that  the  simplex  method  is  a  finite  algorithm  (assuming  that  steps 
are  taken  to  guard  against  cycling).  For  interior-point  methods,  the  situation  is  dif¬ 
ferent.  Every  solution  produced  has  all  variables  strictly  positive.  Yet  for  a  solution 
to  be  optimal  generally  requires  many  variables  to  vanish.  This  vanishing  can  only 
happen  “in  the  limit.”  This  raises  questions,  the  most  fundamental  of  which  are 
these:  does  the  sequence  of  solutions  produced  by  the  path-following  method  con¬ 
verge?  If  so,  is  the  limit  optimal?  How  fast  is  the  convergence?  In  particular,  if 
we  set  “optimality  tolerances,”  how  many  iterations  will  it  take  to  achieve  these 
tolerances?  We  will  address  these  questions  in  this  section. 

In  this  section,  we  will  need  to  measure  the  size  of  various  vectors.  There  are 
many  choices.  For  example,  for  each  1  <  p  <  oo,  we  can  define  the  so-called 
p-norm  of  a  vector  x  as 


The  limit  as  p  tends  to  infinity  is  also  well  defined,  and  it  simplifies  to  the  so-called 
sup-norm : 


x 


oo  =  max 
j 


x, 


5.1.  Measures  of  Progress.  Recall  from  duality  theory  that  there  are  three 
criteria  that  must  be  met  in  order  that  a  primal-dual  solution  be  optimal: 

(1)  Primal  feasibility, 

(2)  Dual  feasibility,  and 

(3)  Complementarity. 

For  each  of  these  criteria,  we  introduce  a  measure  of  the  extent  to  which  they  fail  to 
be  met. 

For  the  primal  feasibility  criterion,  we  use  the  1-norm  of  the  primal  infeasibility 
vector 

p  =  b  —  Ax  —  w. 

For  the  dual  feasibility  criterion,  we  use  the  1-norm  of  the  dual  infeasibility  vector 

rp 

(j  =  c  —  A  y  +  z. 


For  complementarity,  we  use 

7  =  zT  x  +  yTw. 

5.2.  Progress  in  One  Iteration.  For  the  analysis  in  the  section,  we  prefer  to 
modify  the  algorithm  slightly  by  having  it  take  shorter  steps  than  specified  before. 
Indeed,  we  let 
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0  =  r  (  max 

V  iJ 


Axj 

Xj 


A Wi 
Wi 


A  yi 
Vi 


-1 


A  1 


(18.7) 


r 


max(||X  1Ax 


oo  ? 


Z  1  A^rlloo) 


A  1. 


Note  that  the  only  change  has  been  to  replace  the  negative  ratios  with  the  absolute 
value  of  the  same  ratios.  Since  the  maximum  of  the  absolute  values  can  be  larger 
than  the  maximum  of  the  ratios  themselves,  this  formula  produces  a  smaller  value 
for  0.  In  this  section,  let  x,  y,  etc.,  denote  quantities  from  one  iteration  of  the 
algorithm,  and  put  a  tilde  on  the  same  letters  to  denote  the  same  quantity  at  the  next 
iteration  of  the  algorithm.  Hence, 


x  =  x  +  6  Ax,  y  =  y  +  9  Ay, 

w  =  w  +  6  Aw,  z  =  z  +  OAz. 


Now  let’s  compute  some  of  the  other  quantities.  We  begin  with  the  primal 
infeasibility: 


p  =  b  —  Ax  —  w 

=  b  —  Ax  —  w  —  9(AAx  +  Aw). 

But  b  —  Ax  —  w  equals  the  primal  infeasibility  p  (by  definition)  and  AAx  +  Aw 
also  equals  p,  since  this  is  precisely  the  first  equation  in  the  system  that  defines  the 
“delta”  variables.  Hence, 

(18.8)  p=(l-0)p. 

Similarly, 

rji 

a  =  c  —  Ay  +  z 
=  c  —  ATy  +  z  —  6{AAy  —  A  z) 

(18.9)  =  (1  —  9)  a. 

Since  0  is  a  number  between  zero  and  one,  it  follows  that  each  iteration  produces  a 
decrease  in  both  the  primal  and  the  dual  infeasibility  and  that  this  decrease  is  better 
the  closer  0  is  to  one. 

The  analysis  of  the  complementarity  is  a  little  more  complicated  (after  all,  this 
is  the  part  of  the  system  where  the  linearization  took  place): 

7  =  zT  x  +  yTw 

=  (z  +  6Az)t{x  +  6  Ax)  +  (y  +  6Ay)T{w  +  6  Aw) 

nr  'T' 

=  z  x  +  y  w 

+  0(zT  Ax  +  A  zT  x  +  yT  Aw  +  A  yTw) 

+  02  (Azt  Ax  +  AyT  Aw). 
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We  need  to  analyze  each  of  the  0  terms  separately.  From  (18.3),  we  see  that 

zT Ax  +  A zT x  =  eT  ( ZAx  +  XAz) 

=  eT  (fie  —  ZXe) 

=  fin  —  zTx. 

Similarly,  from  (18.4),  we  have 

yT Aw  +  A yTw  =  eT  (Y Aw  +  W Ay) 

=  eT  (ye  -  YWe ) 

=  fim  —  yTw. 

Finally,  (18.1)  and  (18.2)  imply  that 

AzT Ax  +  AyT Aw  =  (AT  Ay  —  cr)T  Ax  +  A yT  (p  —  AAx) 

—  A  yT  p  —  crT  Ax. 

Substituting  these  expressions  into  the  last  expression  for  7,  we  get 

7  =  zT x  +  yTw 

+  6  (fi(n  +  m)  —  (zTx  +  yTw)) 

+  02  (At/tp  —  crT Ax)  . 

At  this  point,  we  recognize  that  zTx  +  yTw  =  7  and  that  fi(n  +  m)  =  ^7.  Hence, 

7  =  (1  —  (1  —  6)0)  7  +  (A yT p  —  crT Ax)  . 

We  must  now  abandon  equalities  and  work  with  estimates.  Our  favorite  tool  for 

estimation  is  the  following  inequality: 


T 

V  W 


E 


VjWj 


J 

^E 
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\vo 


W3 


<  (max \vj  | ) (X'  | | ) 

7 

J 


V 


00 


ir  1 


This  inequality  is  the  trivial  case  of  Holder’s  inequality.  From  Holder’s  inequality, 
we  see  that 


AyTp\  <  ||p||i||Ay| 


OO 


and 


CfT  Ax  < 


a 


\\Ax\ 


00 


Hence, 


7  <  (1  -  (1  -  <5)0)7  +  0(||/9||i ||6» Ay ||oo  + 


a 


OAxWoo) 


Next,  we  use  the  specific  choice  of  step  length  6  to  get  a  bound  on  ||0Ay||oo  and 
OAxWoq.  Indeed,  (18.7)  implies  that 


9  < 


r 


X~'Ax 


< 


Xj 


OO 


Ax . 


for  all  j. 
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Hence, 

Similarly, 


\0/Ax 


oo  • 


|^^2/||oo  5^  1 1  2/ 1 1  oo  * 


If  we  now  assume  that,  along  the  sequence  of  points  x  and  y  visited  by  the  algorithm, 
1 07 1 1 oo  and  Halloo  are  bounded  by  a  large  real  number  M,  then  we  can  estimate  the 
new  complementarity  as 


(18.10) 


7  <  (1  —  (1  —  5)6)  ^  M\\p\\i  +  M 


a 


l- 


5.3.  Stopping  Rule.  Let  e  >  0  be  a  small  positive  tolerance,  and  let  M  <  oo 
be  a  large  finite  tolerance.  If  H^Hoo  gets  larger  than  M,  then  we  stop  and  declare  the 
problem  primal  unbounded.  If  1 1 2/ 1 1  oo  gets  larger  than  M,  then  we  stop  and  declare 
the  problem  dual  unbounded.  Finally,  if  ||p||i  <  e,  ||<r||i  <  e,  and  7  <  e,  then 
we  stop  and  declare  the  current  solution  to  be  optimal  (at  least  within  this  small 
tolerance). 

Since  7  is  a  measure  of  complementarity  and  complementarity  is  related  to  the 
duality  gap,  one  would  expect  that  a  small  value  of  7  should  translate  into  a  small 
duality  gap.  This  turns  out  to  be  true.  Indeed,  from  the  definitions  of  7,  a,  and  p, 
we  can  write 


7 


T  T 

z  x  +  y  w 


=  (a  +  ATy  —  c)1  x  +  y1  ( b  —  Ax 

rr~]  rri  rri  rri 

=  b  y  —  c  x  +  a  x  —  p  y. 


T 


P) 


At  this  point,  we  use  Holder’s  inequality  to  bound  some  of  these  terms  to  get  an 
estimate  on  the  duality  gap: 

bT  y  —  cT 


x 


<  7  + 

<  7  + 


T 

(J  X 


+  \yTp\ 


a 


x 


00 


+  iMiiiiyi 


00 


Now,  if  7,  || <7 ||i,  and  ||p||i  are  all  small  (and 


x 


00 


and  ||2/||oo  are  not  too  big), 


then  the  duality  gap  will  be  small.  This  estimate  shows  that  one  shouldn’t  expect 
the  duality  gap  to  get  small  until  the  primal  and  the  dual  are  very  nearly  feasible. 
Actual  implementations  confirm  this  expectation. 


5.4.  Progress  Over  Several  Iterations.  Now  let  p^k\  a(k\  7W,  0^k\  etc., 
denote  the  values  of  these  quantities  at  the  /cth  iteration.  We  have  the  following 
result  about  the  overall  performance  of  the  algorithm: 


THEOREM  18.1.  Suppose  there  is  a  real  number  t  >  0,  a  real  number  M  <  00, 
and  an  integer  K  such  that  for  all  k  <  K, 


eik)  >  t, 


x(k) 

y{k) 


<  M, 

<  M. 
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Then  there  exists  a  constant  M  <  oo  such  that 


p(k) 

1  <  (l-i)fc 

|P(0) 

a (fe) 

1  <  (1-i)* 

7°) 

l? 


i> 


7(fc)  <  (i  _  t)kM , 


for  all  k  <  K  where 

t  =  £(1  —  (5). 

Proof.  From  (18.8)  and  the  bound  on  it  follows  that 


P 


(*0 


1  <  (1  -t)||/9(fe“1)||l  <  •  <  (1  —  t)fe||/0(0) 


Similarly,  from  (18.9),  it  follows  that 


( 7 


(k) 


1  <  (1  —  t)  ||cr^fc  ^  ||i  <  *  •  *  <  (1  —  t) 


>k  |a(0) 


As  usual,  7^*0  is  harder  to  estimate.  From  (18.10)  and  the  previous  two  estimates, 
we  see  that 


(18.11) 


7«  <  (i  —  t(i  —  <y))  7(*-i) 

+  M(l-t)k~1 
-  -  fWk-l) 


P(0)||l  + 


<J 


(0) 


=  (1  +M(1  -t) 


k-l 


where  M  =  M  ||i  +  ||cr(°)  ||i).  Since  an  analogous  inequality  relates  7^  1) 
to  7^_2\  we  can  substitute  this  analogous  inequality  into  (18.11)  to  get 


7 


(fe)  <  (1  -  t)  [(1  -  t) 7(fe_2)  +  M{  1  -  f)fe_2l  +  M(1  -  t) 


k-l 


=  (1  -f)Vfe-2)  +  M(1  —  t) 


k-l 


1  -  t 
1  -  £ 


+  1 


Continuing  in  this  manner,  we  see  that 


7 


(fc)  <  (1  -  t)2  [(1  -  t) 7(fe_3)  +  M(1  -  t) 


fc-3 


+  M(1  —  t) 


k-l 


1  -  t 
1  -  t 


+  1 


=  (1  -f)37(fe“3)  +M(1  -f) 


fc-i 


1 

r 


—  t 


\  ~  1  —  t 

+  - — -  + 1 


1  -  t 


< . . .  < 


<  (1  -  f)V0)  +  M(1  -  *) 


fe-1 


i-nfc_1  i-t  ' 

H - h  - - -  +  1 


1  -  t 


1  -  t 
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Now  we  sum  the  bracketed  partial  sum  of  a  geometric  series  to  get 


(i-0 


k- 1 


1  -  t 
1  ~t 


k  —  1 


l~t  1 

H - 1 - h  1 

1  -t 


1 


=  (l-f) 


k-l 


1  -  t 
1  -  t 


k 


1  - 


1  -  t 


1  -  t 

_  {l~t)k  -  (l~t)k 
t  —  t 

Recalling  that  t  =  t(l  —  5)  and  dropping  the  second  term  in  the  numerator,  we  get 


MMhf  <  (i  - 1) 


k 


t  —  t 

Putting  this  all  together,  we  see  that 


St 


7 


(k) 


<  (1  -  t)k  7(0)  + 


M 

St 


Denoting  the  parenthesized  expression  by  M  completes  the  proof. 


□ 


Theorem  18.1  is  only  a  partial  convergence  result  because  it  depends  on  the 
assumption  that  the  step  lengths  remain  bounded  away  from  zero.  To  show  that  the 
step  lengths  do  indeed  have  this  property  requires  that  the  algorithm  be  modified 
and  that  the  starting  point  be  carefully  selected.  The  details  are  rather  technical  and 
hence  omitted  (see  the  Notes  at  the  end  of  the  chapter  for  references). 

Also,  before  we  leave  this  topic,  note  that  the  primal  and  dual  infeasibilities  go 
down  by  a  factor  of  1  —  t  at  each  iteration,  whereas  the  duality  gap  goes  down  by  a 
smaller  amount  1  —  t .  The  fact  that  the  duality  gap  converges  more  slowly  that  the 
infeasibilities  is  also  readily  observed  in  practice. 


Exercises 


18.1  Starting  from  (x,w,y,z)  =  (e,e,  e,e),  and  using  S  =  1/10,  and  r  = 
9/10,  compute  (x,w,y,z)  after  one  step  of  the  path-following  method 
for  the  problem  given  in 

(a)  Exercise  2.3. 

(b)  Exercise  2.4. 

(c)  Exercise  2.5. 

(d)  Exercise  2.10. 

18.2  Let  {(xM,  Wn,  z^)  :  fi  >  0}  denote  the  central  trajectory.  Show  that 


lim  bTya  -  c1  Xu  =  oo. 

jLt— )>  OO 


T 


Hint:  look  at  (18.5). 


18.3  Consider  a  linear  programming  problem  whose  feasible  region  is  bounded 
and  has  nonempty  interior.  Use  the  result  of  Exercise  18.2  to  show  that 
the  dual  problem’s  feasible  set  is  unbounded. 
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18.4  Scale  invariance.  Consider  a  linear  program  and  its  dual: 

m  rri 

max  c  x  min  b  y 

(P)  s.t.  Ax  +  w  =  b  (D)  s.t.  ATy  —  z  =  c 

x,w  >  0  y,  z  >  0. 

Let  R  and  S  be  two  given  diagonal  matrices  having  positive  entries  along 
their  diagonals.  Consider  the  scaled  reformulation  of  the  original  problem 
and  its  dual: 

max  ( Sc)T x  min  ( Rb)T y 

(P)  s.t.  RASx  +  w  =  Rb  (D)  s.t.  SATRy  —  z  =  Sc 

x,  w  >  0  y,  z  >  0. 

Let  (xk,  wk,  yk ,  zk)  denote  the  sequence  of  solutions  generated  by  the 
primal-dual  interior-point  method  applied  to  (P)-(P).  Similarly,  let 
(xk,  wk ,yk ,  zk )  denote  the  sequence  of  solutions  generated  by  the  primal- 
dual  interior-point  method  applied  to  (P)-(P).  Suppose  that  we  have  the 
following  relations  among  the  starting  points: 

x°  =  S~1x° ,  w°  =  Rw° ,  y°  =  R~1y° ,  z°  =  Sz° . 

Show  that  these  relations  then  persist.  That  is,  for  each  k  >  1, 

Xk  =  s~1xk,  wk  =  Rwk,  yk  =  R~1yk ,  zk  =  Szk. 

18.5  Homotopy  method.  Let  x,  y ,  z,  and  w  be  given  componentwise  positive 
“initial”  values  for  x,  y ,  z,  and  w,  respectively.  Let  t  be  a  parameter 
between  0  and  1.  Consider  the  following  nonlinear  system: 

Ax  +  w  =  tb  +  (1  —  t)(Ax  +  w) 

ATy  —  z  =  tc  +  (1  —  t){ATy  —  z) 

(18.12)  XZe=  (1  -t)XZ_e 

YWe=  (1  -t)YWe 
x ,  y,z,w>  0. 

(a)  Use  Exercise  17.3  to  show  that  this  nonlinear  system  has  a  unique 
solution  for  each  0  <  t  <  1.  Denote  it  by  (■ x{t),y(t),z(t),w(t )). 

(b)  Show  that  (x(0),  y( 0),  z(0),  u;(0))  =  (p  ip  z,  w). 

(c)  Assuming  that  the  limit 

{x(l),y(l),z(l),w(l))  =  lim  (x(t),y(t),z(t),w(t)) 

L  r  A- 

exists,  show  that  it  solves  the  standard-form  linear  programming  prob¬ 
lem. 

(d)  The  family  of  solutions  (x(t),  y(t),  z(t),  w(t)),  0  <  t  <  1,  describes 
a  curve  in  “primal-dual”  space.  Show  that  the  tangent  to  this  curve  at 
t  =  0  coincides  with  the  path-following  step  direction  at  (x,  y,  z,  w) 
computed  with  p  =  0;  that  is, 
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( dx dy  /  .  dz  dw  .X  /A  A  A  A  . 

U(0)-*<0)'  *<0)' ^<0,j  =  (Al'A!,’Az’A*”)’ 

where  (Ax,  A?/,  Az,  Ait;)  is  the  solution  to  (18. 1)— (18.4). 

18.6  Higher-order  methods.  The  previous  exercise  shows  that  the  path-follow¬ 
ing  step  direction  can  be  thought  of  as  the  direction  one  gets  by  approxi¬ 
mating  a  homotopy  path  with  its  tangent  line: 

/  x  /  \  doc  , 

x(t)  «  x(0)  -I-  — (0)t. 

By  using  more  terms  of  the  Taylor’s  series  expansion,  one  can  get  a  better 
approximation: 

,  .  .  v  doc  ,  *  1  d2x  .  N  9  1  dkx  ,  x  7, 

x(t)  «  x(0)  +  -(0)i  +  -  ^2  (0)t  +  •  •  •  +  • 

(a)  Differentiating  the  equations  in  (18.12)  twice,  derive  a  linear  system 
for  (d2x/dt2( 0),  d2y /dt2( 0),  d2z/dt2( 0),  d2u;/df2(0)). 

(b)  Can  the  same  technique  be  applied  to  derive  linear  systems  for  the 
higher-order  derivatives? 

18.7  Linear  Complementarity  Problem.  Given  a  kxk  matrix  M  and  a  /c-vec  tor 
q,  a  vector  x  is  said  to  solve  the  linear  complementarity  problem  if 

—Mx  +  z  =  q 
XZe  =  0 

x,  z  >0 


(note  that  the  first  equation  can  be  taken  as  the  definition  of  z). 

(a)  Show  that  the  optimality  conditions  for  linear  programming  can  be 
expressed  as  a  linear  complementarity  problem  with 


M  = 


0  -A 
AT  0 


(b)  The  path-following  method  introduced  in  this  chapter  can  be  ex¬ 
tended  to  cover  linear  complementarity  problems.  The  main  step  in 
the  derivation  is  to  replace  the  complementarity  condition  XZe  =  0 
with  a  /i-complementarity  condition  XZe  =  fie  and  then  to  use 
Newton’s  method  to  derive  step  directions  Ax  and  Az.  Carry  out 
this  procedure  and  indicate  the  system  of  equations  that  define  Ax 
and  Az. 

(c)  Give  conditions  under  which  the  system  derived  above  is  guaranteed 
to  have  a  unique  solution. 

(d)  Write  down  the  steps  of  the  path-following  method  for  the  linear 
complementarity  problem. 

(e)  Study  the  convergence  of  this  algorithm  by  adapting  the  analysis 
given  in  Section  18.5. 
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18.8  Consider  again  the  L1  -regression  problem: 


minimize 


i- 


Complete  the  following  steps  to  derive  the  step  direction  vector  Ax  asso¬ 
ciated  with  the  primal-dual  affine-scaling  method  for  solving  this  problem. 

(a)  Show  that  the  L1  -regression  problem  is  equivalent  to  the  following 
linear  programming  problem: 

minimize  eT(t++t_) 

(18.13)  subject  to  Ax  +  £+  —  t-  =  b 

t_|_ ,  t_  >  0. 

(b)  Write  down  the  dual  of  (18.13). 

(c)  Add  slack  and/or  surplus  variables  as  necessary  to  reformulate  the 
dual  so  that  all  inequalities  are  simple  nonnegativities  of  variables. 

(d)  Identify  all  primal-dual  pairs  of  complementary  variables. 

(e)  Write  down  the  nonlinear  system  of  equations  consisting  of:  (1)  the 
primal  equality  constraints,  (2)  the  dual  equality  constraints,  (3)  all 
complementarity  conditions  (using  fi  =  0  since  we  are  looking  for 
an  affine- scaling  algorithm). 

(f)  Apply  Newton’s  method  to  the  nonlinear  system  to  obtain  a  linear 
system  for  step  directions  for  all  of  the  primal  and  dual  variables. 

(g)  We  may  assume  without  loss  of  generality  that  both  the  initial  primal 
solution  and  the  initial  dual  solution  are  feasible.  Explain  why. 

(h)  The  linear  system  derived  above  is  a  6  x  6  block  matrix  system.  But 
it  is  easy  to  solve  most  of  it  by  hand.  First  eliminate  those  step  di¬ 
rections  associated  with  the  nonnegative  variables  to  arrive  at  a  2  x  2 
block  matrix  system. 

(i)  Next,  solve  the  2x2  system.  Give  an  explicit  formula  for  Ax. 

(j)  How  does  this  primal-dual  affine-scaling  algorithm  compare  with 
the  iteratively  reweighted  least  squares  algorithm  defined  in  Sec¬ 
tion  12.5? 


18.9  (a)  Let  ,  j  =  1,2,...,  denote  a  sequence  of  real  numbers  between  zero 
and  one.  Show  that  .(1  ~  €j)  =  0  if  JU  =  oo. 

(b)  Use  the  result  of  part  a  to  prove  the  following  convergence  result:  if 


the  sequences 


x 


(k) 


oo  ? 


are  bounded  and  ^ 


k 


k  = 
Q(k)  —  oo 


1  2 
then 


and  || 


oo? 


k  =  1,2 


1 


lim 

k—>  oo 

P w 

1=0 

lim 

k  — yoo 

<7« 

1=0 

lim  7^)  =  0. 

k — yoo 
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Notes 

The  path-following  algorithm  introduced  in  this  chapter  has  its  origins  in  a  pa¬ 
per  by  Kojima  et  al.  (1989).  Their  paper  assumed  an  initial  feasible  solution  and 
therefore  was  a  true  interior-point  method.  The  method  given  in  this  chapter  does 
not  assume  the  initial  solution  is  feasible — it  is  a  one-phase  algorithm.  The  simple 
yet  beautiful  idea  of  modifying  the  Kojima-Mizuno-Yoshise  primal-dual  algorithm 
to  make  it  into  a  one-phase  algorithm  is  due  to  Lustig  (1990). 

Of  the  thousands  of  papers  on  interior-point  methods  that  have  appeared  in 
the  last  decade,  the  majority  have  included  convergence  proofs  for  some  version 
of  an  interior-point  method.  Here,  we  only  mention  a  few  of  the  important  pa¬ 
pers.  The  first  polynomial-time  algorithm  for  linear  programming  was  discovered 
by  Khachian  (1979).  Khachian’s  algorithm  is  fundamentally  different  from  any  al¬ 
gorithm  presented  in  this  book.  Paradoxically,  it  proved  in  practice  to  be  inferior  to 
the  simplex  method.  N.K.  Karmarkar’s  pathbreaking  paper  (Karmarkar  1984)  con¬ 
tained  a  detailed  convergence  analysis.  His  claims,  based  on  preliminary  testing, 
that  his  algorithm  is  uniformly  substantially  faster  than  the  simplex  method  sparked 
a  revolution  in  linear  programming.  Unfortunately,  his  claims  proved  to  be  exag¬ 
gerated,  but  nonetheless  interior-point  methods  have  been  shown  to  be  competitive 
with  the  simplex  method  and  usually  superior  on  very  large  problems.  The  con¬ 
vergence  proof  for  a  primal-dual  interior-point  method  was  given  by  Kojima  et  al. 
(1989).  Shortly  thereafter,  Monteiro  and  Adler  (1989)  improved  on  the  convergence 
analysis.  Two  recent  survey  papers,  Todd  (1995)  and  Anstreicher  (1996),  give  nice 
overviews  of  the  current  state  of  the  art.  Also,  a  soon-to-be-published  book  by 
Wright  (1996)  should  prove  to  be  a  valuable  reference  to  the  reader  wishing  more 
information  on  convergence  properties  of  these  algorithms. 

The  homotopy  method  outlined  in  Exercise  18.5  is  described  in  Nazareth  (1986, 
1996).  Higher-order  path-following  methods  are  described  (differently)  in  Carpen¬ 
ter  et  al.  (1993). 


CHAPTER  19 


The  KKT  System 


The  most  time-consuming  aspect  of  each  iteration  of  the  path-following  method 
is  solving  the  system  of  equations  that  defines  the  step  direction  vectors  Ax ,  Ay, 
Aw,  and  Az: 

(19.1)  AAx  +  Aw  =  p 

(19.2)  AT  A  y  —  A  z  =  cr 

(19.3)  Z  Ax  +  XAz  =  fie  —  X  Ze 

(19.4)  WAy  +  YAw  =  pe  -  YWe. 

After  minor  manipulation,  these  equations  can  be  written  in  block  matrix  form  as 
follows: 


'  - xz -1 

-I 

‘  Az  ' 

—pZ  1eJrX 

(19.5) 

A  I 

Ay 

P 

-/ 

AT 

Ax 

a 

I 

YW~l  _ 

Aw 

pW~xe  —  y 

This  system  is  called  the  Karush-Kuhn-Tucker  system,  or  simply  the  KKT  system. 
It  is  a  symmetric  linear  system  of  2 n  +  2m  equations  in  2 n  +  2m  unknowns.  One 
could,  of  course,  perform  a  factorization  of  this  large  system  and  then  follow  that 
with  a  forward  and  backward  substitution  to  solve  the  system.  However,  it  is  better 
to  do  part  of  this  calculation  “by  hand”  first  and  only  use  a  factorization  routine  to 
help  solve  a  smaller  system.  There  are  two  stages  of  reductions  that  one  could  apply. 
After  the  first  stage,  the  remaining  system  is  called  the  reduced  KKT  system,  and 
after  the  second  stage  it  is  called  the  system  of  normal  equations.  We  shall  discuss 
these  two  systems  in  the  next  two  sections. 

1.  The  Reduced  KKT  System 

Equations  (19.3)  and  (19.4)  are  trivial  (in  the  sense  that  they  only  involve  diag¬ 
onal  matrices),  and  so  it  seems  sensible  to  eliminate  them  right  from  the  start.  To 
preserve  the  symmetry  that  we  saw  in  (19.5),  we  should  solve  them  for  Az  and  Aw, 
respectively: 

A z  =  X^ifie  -  XZe  -  ZAx) 

Aw  =  Y~1(/j,e  -  YWe  -  WAy). 
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Substituting  these  formulas  into  (19.1)  and  (19.2),  we  get  the  so-called  reduced  KKT 
system : 

(19.6)  AAx  —  Y~1W  Ay  =  p  —  pY~1e  +  w 

(19.7)  AT Ay  Y  X~x Z Ax  =  cr  Y  pX~1e  —  z. 

Substituting  in  the  definitions  of  p  and  a  and  writing  the  system  in  matrix  notation, 
we  get 


1 

1 

I—1 

3 

A 

Ay 

b  —  Ax  —  pY  1e 

AT 

N 

i—i 

i 

Ax 

c  —  ATy  +  pX~le 

Note  that  the  reduced  KKT  matrix  is  again  a  symmetric  matrix.  Also,  the  right-hand 
side  displays  symmetry  between  the  primal  and  the  dual.  To  reduce  the  system  any 
further,  one  needs  to  break  the  symmetry  that  we  have  carefully  preserved  up  to  this 
point.  Nonetheless,  we  forge  ahead. 


2.  The  Normal  Equations 

For  the  second  stage  of  reduction,  there  are  two  choices:  we  could  either  (1) 
solve  (19.6)  for  Ay  and  eliminate  it  from  (19.7)  or  (2)  solve  (19.7)  for  Ax  and 
eliminate  it  from  (19.6).  For  the  moment,  let  us  assume  that  we  follow  the  latter 
approach.  In  this  case,  we  get  from  (19.7)  that 

(19.8)  Ax  =  XZ-\c  -  ATy  +  fiX^e  -  ATAy ), 
which  we  use  to  eliminate  Ax  from  (19.6)  to  get 

(19.9)  -(Y^W  +  AXZ~1AT)Ay  =  b  -  Ax  -  fiY^e 

-  AXZ-\c  -  ATy  +  nX-'e). 

This  last  system  is  a  system  of  m  equations  in  m  unknowns.  It  is  called  the  system 
of  normal  equations  in  primal  form.  It  is  a  system  of  equations  involving  the  matrix 
Y~XW  Y  AX Z~l AT .  The  Y~XW  term  is  simply  a  diagonal  matrix,  and  so  the 
real  meat  of  this  matrix  is  contained  in  the  AXZ~x  AT  term. 

Given  that  A  is  sparse  (which  is  generally  the  case  in  real-world  linear  pro¬ 
grams),  one  would  expect  the  matrix  AXZ~xAT  to  be  likewise  sparse.  However, 
we  need  to  investigate  the  sparsity  of  AXZ~1AT  (or  lack  thereof)  more  closely. 
Note  that  the  (z,  j)th  element  of  this  matrix  is  given  by 

n 

(AXZ-1AT)ij=Yjaik—ajk. 

fc= i  Zk 

That  is,  the  (i,j) th  element  is  simply  a  weighted  inner  product  of  rows  i  and  j  of 
the  A  matrix.  If  these  rows  have  disjoint  nonzero  patterns,  then  this  inner  product  is 
guaranteed  to  be  zero,  but  otherwise  it  must  be  treated  as  a  potential  nonzero.  This 
is  bad  news  if  A  is  generally  sparse  but  has,  say,  one  dense  column: 
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But  don’t  forget  that  we  didn’t  have  to  go  the  primal  normal  equations  route. 
Instead,  we  could  have  chosen  the  other  alternative  of  solving  (19.6)  for  Ay, 

Ay  =  -YW-\b  -Ax-  fiY^e  -  AAx), 


and  eliminating  it  from  (19.7): 

(19.10)  ( AtYW~1A  +  X~1Z)Ax  =  c  -  ATy  +  ^X~le 

+  ATYW~\b  -Ax-  y.Y~1e). 


The  system  defined  by  (19.10)  is  a  system  of  n  equations  in  n  unknowns.  It  is  called 
the  system  of  normal  equations  in  dual  form.  Note  that  dense  columns  do  not  pose 
a  problem  for  these  equations.  Indeed,  for  the  example  given  above,  we  now  get 


* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

While  this  system  is  larger  than  the  one  before,  it  is  also  sparse,  and  sparsity  almost 
always  is  more  important  than  matrix  dimensions.  In  this  example,  the  dense  matrix 
associated  with  the  primal  normal  equations  requires  65  arithmetic  operations  to 
factor,  whereas  the  larger,  sparser  matrix  associated  with  the  dual  normal  equations 
requires  just  60.  This  is  a  small  difference,  but  these  are  small  matrices.  As  the 
matrices  involved  get  large,  factoring  a  dense  matrix  takes  on  the  order  of  n3  oper¬ 
ations,  whereas  a  very  sparse  matrix  might  take  only  on  the  order  of  n  operations. 
Clearly,  as  n  gets  large,  the  difference  between  these  becomes  quite  significant. 

It  would  be  great  if  we  could  say  that  it  is  always  best  to  solve  the  primal  normal 
equations  or  the  dual  normal  equations.  But  as  we’ve  just  seen,  dense  columns  in  A 
are  bad  for  the  primal  normal  equations  and,  of  course,  it  follows  that  dense  rows 
are  bad  for  the  dual  normal  equations.  Even  worse,  some  problems  have  constraint 
matrices  A  that  are  overall  very  sparse  but  contain  some  dense  rows  and  some  dense 
columns.  Such  problems  are  sure  to  run  into  trouble  with  either  sets  of  normal  equa¬ 
tions.  For  these  reasons,  it  is  best  to  factor  the  matrix  in  the  reduced  KKT  system 
directly.  Then  it  is  possible  to  find  pivot  orders  that  circumvent  the  difficulties  posed 
by  both  dense  columns  and  dense  rows. 
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3.  Step  Direction  Decomposition 

In  the  next  chapter,  we  shall  discuss  factorization  techniques  for  symmetric 
matrices  (along  with  other  implementation  issues).  However,  before  we  embark  on 
that  discussion,  we  end  this  chapter  by  taking  a  closer  look  at  the  formulas  for  the 
step  direction  vectors.  To  be  specific,  let  us  look  at  Ax.  From  the  primal  normal 
equations  (19.9),  we  can  solve  for  Ay  and  then  substitute  the  solution  into  (19.8)  to 
get  an  explicit  formula  for  Ax : 

(19.11)  Ax  =  (D2  -  D2At(E~2  +  AD2AT)~1AD2)  (c  -  ATy  +  /rX_1e) 

+  D2At{E~ 2  +  AD2AT)~1(b  -Ax- 

where  we  have  denoted  by  D  the  positive  diagonal  matrix  defined  by 

D2  =XZ~1 

and  we  have  denoted  by  E  the  positive  diagonal  matrix  defined  by 

E2  =  YW-1 

(defining  these  matrices  by  their  squares  is  possible,  since  the  squares  have  positive 
diagonal  entries).  However,  using  the  dual  normal  equations,  we  get 

(19.12)  Ax  =  ( AtE2A  +  D~2yl  (c  -  ATy  +  yX^e) 

+  ( AtE2A  +  D“2)-1  ATE2(b  -Ax-  yY~le). 

These  two  expressions  for  Ax  look  entirely  different,  but  they  must  be  the  same, 
since  we  know  that  Ax  is  uniquely  defined  by  the  reduced  KKT  system.  They  are 
indeed  the  same,  as  can  be  shown  directly  by  establishing  a  certain  matrix  identity. 
This  is  the  subject  of  Exercise  19.1.  There  are  a  surprising  number  of  published 
research  papers  on  interior-point  methods  that  present  supposedly  new  algorithms 
that  are  in  fact  identical  to  existing  methods.  These  papers  get  published  because 
the  equivalence  is  not  immediately  obvious  (such  as  the  one  we  just  established). 

We  can  gain  further  insight  into  the  path-following  method  by  looking  more 
closely  at  the  primal  step  direction  vector.  Formula  (19.11)  can  be  rearranged  as 
follows: 

Ax  =  (D2  -  D2At{E~ 2  +  AD2  At)~1  AD2)  c 

+  y(D2  -  D2At(E~ 2  +  AD2 AT)~1  AD2)  X_1e 

-  hD2At{E~2  +  AD2ATy1Y~1e 
+  D2At{E~2  +  AD2AT)~1(b  -  Ax) 

-  D2At  (/  -  ( E~ 2  +  AD2 At)~1  AD2 At)  y. 

For  comparison  purposes  down  the  road,  we  prefer  to  write  the  Y  -1e  that  appears  in 
the  second  term  containing  y  as  E~2W~1e.  Also,  using  the  result  of  Exercise  19.2, 
we  can  rewrite  the  bulk  of  the  last  line  as  follows: 

(/  -  ( E~ 2  +  AD2At)~1AD2At)  y  =  (E~2  +  AD2AT)-1E~2y 

=  (E~2  +  AD2At)~1w. 
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Putting  this  all  together,  we  get 

Ax  =  (D2  -  D2At{E~2  +  AD2 AT)~1  AD2)  c 

+  p  (D2  -  D2At(E~2  +  AD2 At)~1  AD2)  X_1e 
-  pD2AT{E~2  +  AD2AT)-1E~2W~1e 
+  D2At{E~ 2  +  AD2AT)~1p 
=  Ax0pt  +  AXctr  +  Axfeas, 


where 


Axopt  =  ( D 2  -  D2At(E~2  +  AD2 AT)~1  AD2)  c, 

Axctr  =  (D2  -  D2At{E~2  +  AD2 AT)~1  AD2)  A_1e 
-  D2At(E~ 2  +  AD2AT)~1E~2W~1e, 

and 

Axfeas  =  D2AT(E~2  +  AD2AT)~1p. 

In  Chapter  21,  we  shall  show  that  these  components  of  Ax  have  important  connec¬ 
tions  to  the  step  directions  that  arise  in  a  related  interior-point  method  called  the 
affine-scaling  method.  For  now,  we  simply  establish  some  of  their  properties  as 
they  relate  to  the  path-following  method.  Our  first  result  is  that  Ax0PT  is  an  ascent 
direction. 


Theorem  19.1.  ctAt0PT  >  0. 


Proof.  We  use  the  result  of  Exercise  19.1  (with  the  roles  of  E  and  D  switched) 
to  see  that 

Axopt  =  (AtE2A  +  D -2)-1c. 

Hence, 

ctAx0PT  =  ct(AtE2A  +  D~2)~1c. 

We  claim  that  the  right-hand  side  is  obviously  nonnegative,  since  the  matrix  sand¬ 
wiched  between  c  and  its  transpose  is  positive  semidefinite.  Indeed,  the  claim 
follows  from  the  definition  of  positive  semidefiniteness:  a  matrix  B  is  positive  semi- 
definite  if  <^T >  0  for  all  vectors  £.  To  see  that  the  matrix  in  question  is  in  fact 
positive  semidefinite,  we  first  note  that  AT E2  A  and  D~2  are  positive  semidefinite: 

pATE2Ai  =  ||£AC||2  >  0  and  pD~2^  =  ||£>-1£||2  >  0. 

Then  we  show  that  the  sum  of  two  positive  semidefinite  matrices  is  positive  semi¬ 
definite  and  finally  that  the  inverse  of  a  symmetric  positive  semidefinite  matrix  is 
positive  semidefinite.  To  verify  closure  under  summation,  suppose  that  and 
B are  positive  semidefinite,  and  then  compute 

+  B{2))£  =  £tB{1)£  +  £tB{2)£  >  0. 

To  verify  closure  under  forming  inverses  of  symmetric  positive  semidefinite  matri¬ 
ces,  suppose  that  B  is  symmetric  and  positive  semidefinite.  Then 


1 


In  fact,  it’s  positive  definite,  but  we  don’t  need  this  stronger  property  here. 
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£t.B-1£  =  fB-'BB-'Z  =  (B-lOTB{B~lO  >  0, 

where  the  inequality  follows  from  the  fact  that  B  is  positive  semidefinite  and  B~1^ 
is  simply  any  old  vector.  This  completes  the  proof.  □ 

The  theorem  just  proved  justifies  our  referring  to  AxOPT  as  a  step-toward- 
optimality  direction.  We  next  show  that  Axfeas  is  in  fact  a  step-toward-feasibility. 

In  Exercise  19.3,  you  are  asked  to  find  the  formulas  for  the  primal  slack  vector’s 
step  directions,  Aie0pt,  AieCTr,  and  AieFEAs.  It  is  easy  to  verify  from  these  formulas 
that  the  pairs  (AtOPt,  Air0PT)  and  (AxCTR,  A wCTR)  preserve  the  current  level  of 
infeasibility.  That  is, 

and 

- 1-  —  0. 

Hence,  only  the  “feasibility”  directions  can  improve  the  degree  of  feasibility.  In¬ 
deed,  it  is  easy  to  check  that 

t4A;£FEAS  +  Au;FEAS  =  p. 

Finally,  we  consider  A%R.  If  the  objective  function  were  zero  (i.e.,  c  =  0) 
and  if  the  current  point  were  feasible,  then  steps  toward  optimality  and  feasibility 
would  vanish  and  we  would  be  left  with  just  AxCTR.  Since  our  step  directions  were 
derived  in  an  effort  to  move  toward  a  point  on  the  central  path  parametrized  by  p, 
we  now  see  that  AxCTr  plays  the  role  of  a  step-toward-centrality. 

Exercises 

19.1  Sherman-Morrison-Woodbury  Formula.  Assuming  that  all  the  inverses 
below  exist,  show  that  the  following  identity  is  true: 

(. E -1  +  ADAt)~1  =  E  -  EA(AtEA  +  D~1)~1AtE. 

Use  this  identity  to  verify  directly  the  equivalence  of  the  expressions  given 
for  Ax  in  (19.11)  and  (19.12). 

19.2  Assuming  that  all  the  inverses  exist,  show  that  the  following  identity  holds: 

I-(E  +  ADAt)~1ADAt  =  {E  +  ADAt)~1E. 

19.3  Show  that 

Aw  Awqpj  +  pA wCjR  H"  AtEppAs? 

where 

Awqpt  =  -A  (D2  -  D2At(E~2  +  AD2 AT)~l AD2)  c, 

AwCtr  =  —A  [p2  -  D2At(E~ 2  +  AD2At)~1AD2)  X-1e 
+  AD2At{E~ 2  +  AD2AT)-1E~2W~1e , 

and 

AlEppAS  = 


p  -  AD2At(E~2  +  AD2AT)~1p. 
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Notes 

The  KKT  system  for  general  inequality  constrained  optimization  problems  was 
derived  by  Kuhn  and  Tucker  (1951).  It  was  later  discovered  that  W.  Karush  had 
proven  the  same  result  in  his  1939  master’s  thesis  at  the  University  of  Chicago 
(Karush  1939).  John  (1948)  was  also  an  early  contributor  to  inequality-constrained 
optimization.  Kuhn’s  survey  paper  (Kuhn  1976)  gives  a  historical  account  of  the 
development  of  the  subject. 


CHAPTER  20 


Implementation  Issues  for  Interior-Point  Methods 


In  this  chapter,  we  discuss  implementation  issues  that  arise  in  connection  with 
the  path-following  method. 

The  most  important  issue  is  the  efficient  solution  of  the  systems  of  equations 
discussed  in  the  previous  chapter.  As  we  saw,  there  are  basically  three  choices, 
involving  either  the  reduced  KKT  matrix, 


(20.1) 


-E~2  A 
AT  D~ 2  ’ 


or  one  of  the  two  matrices  associated  with  the  normal  equations: 
(20.2)  AD2At  +  E~2 


or 

(20.3)  ATE2A  +  D~2. 

(Here,  E~2  =  Y^W  and  D~ 2  =  X^Z.) 

In  the  previous  chapter,  we  explained  that  dense  columns/rows  are  bad  for  the 
normal  equations  and  that  therefore  one  might  be  better  off  solving  the  system  in¬ 
volving  the  reduced  KKT  matrix.  But  there  is  also  a  reason  one  might  prefer  to 
work  with  one  of  the  systems  of  normal  equations.  The  reason  is  that  these  matri¬ 
ces  are  positive  definite.  We  shall  show  in  the  first  section  that  there  are  important 
advantages  in  working  with  positive  definite  matrices.  In  the  second  section,  we 
shall  consider  the  reduced  KKT  matrix  and  see  to  what  extent  the  nice  properties 
possessed  by  positive  definite  matrices  carry  over  to  these  matrices. 

After  finishing  our  investigations  into  numerical  factorization,  we  shall  take  up 
a  few  other  relevant  tasks,  such  as  how  one  extends  the  path-following  algorithm  to 
handle  problems  with  bounds  and  ranges. 


1.  Factoring  Positive  Definite  Matrices 

As  we  saw  in  the  proof  of  Theorem  19.1,  the  matrix  (20.2)  appearing  in  the  pri¬ 
mal  normal  equations  is  positive  semidefinite  (and  so  is  (20.3),  of  course).  In  fact,  it 
is  even  better — it’s  positive  definite.  A  matrix  B  is  positive  definite  if  £ T B £  >  0  for 
all  vectors  £  0.  In  this  section,  we  will  show  that,  if  we  restrict  our  row/column  re¬ 

ordering  to  symmetric  reorderings,  that  is,  reorderings  where  the  rows  and  columns 
undergo  the  same  permutation,  then  there  is  no  danger  of  encountering  a  pivot 
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element  whose  value  is  zero.  Hence,  the  row/column  permutation  can  be  selected 
ahead  of  time  based  only  on  the  aim  of  maintaining  sparsity. 

If  we  restrict  ourselves  to  symmetric  permutations,  each  pivot  element  is  a  diag¬ 
onal  element  of  the  matrix.  The  following  result  shows  that  we  can  start  by  picking 
an  arbitrary  diagonal  element  as  the  first  pivot  element: 

Theorem  20.1.  If  B  is  positive  definite,  then  bn  >  0  for  all  i. 

The  proof  follows  trivially  from  the  definition  of  positive  definiteness: 

bu  =  ef  Bei  >  0. 


The  next  step  is  to  show  that  after  each  stage  of  the  elimination  process,  the  remain¬ 
ing  uneliminated  matrix  is  positive  definite.  Let  us  illustrate  by  breaking  out  the 
first  row/column  of  the  matrix  and  looking  at  what  the  first  step  of  the  elimination 
process  does.  Breaking  out  the  first  row/column,  we  write 


B 


a  bT 
b  C 


Here,  a  is  the  first  diagonal  element  (a  scalar),  b  is  the  column  below  a,  and  C  is  the 
matrix  consisting  of  all  of  B  except  the  first  row/column.  One  step  of  elimination 
(as  described  in  Chapter  8)  transforms  B  into 


a 

bT 

b 

V  z  \ 

The  following  theorem  tells  us  that  the  uneliminated  part  is  positive  definite: 
Theorem  20.2.  If  B  is  positive  definite,  then  so  is  C  —  bbT /a. 

Proof.  The  fact  that  B  is  positive  definite  implies  that 


(20.4) 


r  t  i 

a 

bT ' 

X 

x  y 

b 

C  _ 

_y  _ 

ax 2  +  2  yTbx  +  yTCy 


is  positive  whenever  the  scalar  x  or  the  vector  y  is  nonzero  (or  both).  Fix  a  vector 
y  7^  0,  and  put  x  =  —-bTy.  Using  these  choices  in  (20.4),  we  get 

CL 


0  < 


-yTbbTy  -  2 -yTbbTy  +  yTCy  =  yT  (c  -  — )  y. 
a  a  \  a  J 


Since  y  was  an  arbitrary  nonzero  vector,  it  follows  that  C—bbT /a  is  positive  definite. 

□ 


Hence,  after  one  step  of  the  elimination,  the  uneliminated  part  is  positive  def¬ 
inite.  It  follows  by  induction  then  that  the  uneliminated  part  is  positive  definite  at 
every  step  of  the  elimination. 
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Here’s  an  example: 


B 


2  -1  -1 
-1  3  -1  -1 

-1  2  -1 
-1  -1  3  -1 

-1  -1  3 


At  the  end  of  the  four  steps  of  the  elimination  (without  permutations),  we  end  up 
with 


2 

-1 


1 

5 

2 

1 


1 


g  -1  -1  -k 


1 


_ -j^  _ _ 1 1 


_  7 
5 

11 

8 

11 

8 


1 
2 
1 
"  5 

n 

8 

1 


From  this  eliminated  form,  we  extract  the  lower  triangular  matrix,  the  diagonal  ma¬ 
trix,  and  the  upper  triangular  matrix  to  write  B  as 


B- 


"  2 

"2 

-1 

"2  -1 

-1" 

-1 

5 

2 

5 

2 

5 

2 

-1 

-1 

1 

2 

-1 

8 

8 

8 

7 

1 

5 

5 

5 

5 

5 

-1 

7 

11 

11 

11 

11 

5 

8 

8 

8 

8 

-1 

1 

2 

1 

5 

11 

8 

1_ 

1_ 

1_ 

As  we  saw  in  Chapter  8,  it  is  convenient  to  combine  the  lower  triangular  matrix  with 
the  diagonal  matrix  to  get  a  new  lower  triangular  matrix  with  ones  on  the  diagonal. 
But  the  current  lower  triangular  matrix  is  exactly  the  transpose  of  the  upper  trian¬ 
gular  matrix.  Hence,  to  preserve  symmetry,  we  should  combine  the  diagonal  matrix 
with  both  the  lower  and  the  upper  triangular  matrices.  Since  it  only  appears  once, 
we  must  multiply  and  divide  by  it  (in  the  middle  of  the  product).  Doing  this,  we  get 


"  1 

"2 

n  _ i 

2 

1  -| 

2 

1 

1 

5 

1  - 

2 

2 

1 

2 

2 

5 

5 

5 

2 

1 

8 

1 

7 

1 

5 

5 

8 

8 

2 

5 

7 

8 

1 

11 

8 

1 

-1 

i 

tOlh-1 

1 

5 

1 

8 

-1  1_ 

1_ 

1_ 

The  lower  triangular  matrix  in  this  representation  is  usually  denoted  by  L  and  the 
diagonal  matrix  by  D  (not  to  be  confused  with  the  D  at  the  beginning  of  the  chapter). 
Hence,  this  factorization  can  be  summarized  as 


B  =  LDLt 


and  is  referred  to  as  an  LD LT -factorization.  Of  course,  once  a  factorization  is 
found,  it  is  easy  to  solve  systems  of  equations  using  forward  and  backward  substi¬ 
tution  as  discussed  in  Chapter  8. 


296 


20.  IMPLEMENTATION  ISSUES  FOR  INTERIOR-POINT  METHODS 


1.1.  Stability.  We  began  our  discussion  of  factoring  positive  definite  matrices 
with  the  comment  that  a  symmetric  permutation  can  be  chosen  purely  with  the  aim 
of  preserving  sparsity,  since  it  is  guaranteed  that  no  pivot  element  will  ever  vanish. 
However,  the  situation  is  even  better  than  that — we  can  show  that  whenever  a  pivot 
element  is  small,  so  is  every  other  nonzero  in  the  uneliminated  part  of  the  same 
row/column.  Before  saying  why,  we  need  to  set  down  a  few  technical  results. 

Theorem  20.3.  If  bn  denotes  a  diagonal  element  in  the  uneliminated  subma¬ 
trix  at  some  stage  of  an  elimination  and  bn  denotes  the  original  value  of  that  diag¬ 
onal  element,  then  0  <  6«  <  hi. 


Proof.  The  positivity  of  bn  follows  from  the  fact  the  uneliminated  submatrix 
is  positive  definite.  The  fact  that  it  is  bounded  above  by  bn  follows  from  the  fact 
that  each  step  of  the  elimination  can  only  decrease  diagonal  elements,  which  can  be 
seen  by  looking  at  the  first  step  of  the  elimination.  Using  the  notation  introduced 
just  after  Theorem  20.1, 


b2 

—  <  ca. 
a 


□ 


Theorem  20.4.  If  B  is  symmetric  and  positive  definite,  then  \bi3\  <  \Jbnbj3 
for  all  i  j. 

PROOF.  Fix  i  j  and  let  £  =  rei  +  e3.  That  is,  £  is  the  vector  that’s  all  zero 
except  for  the  fih  and  jth  position,  where  it’s  r  and  1,  respectively.  Then, 

0  <  €tB£  =  bur2  +  2bijV  +  bjj, 

for  all  r  E  R.  This  quadratic  expression  is  positive  for  all  values  of  r  if  and  only  if 
it  is  positive  at  its  minimum,  and  it’s  easy  to  check  that  it  is  positive  at  that  point  if 
and  only  if  |  bij  \  <  \Jb%fi33.  □ 


These  two  theorems,  together  with  the  fact  that  the  uneliminated  submatrix 
is  symmetric  and  positive  definite,  give  us  bounds  on  the  off-diagonal  elements. 
Indeed,  consider  the  situation  after  a  number  of  steps  of  the  elimination.  Using  bars 
to  denote  matrix  elements  in  the  uneliminated  submatrix  and  letting  M  denote  an 
upper  bound  on  the  diagonal  elements  before  the  elimination  process  began  (which, 
without  loss  of  generality,  could  be  taken  as  1),  we  see  that,  if  bj3  <  e,  then 

(20.5)  hj  <  VeM. 

This  bound  is  exceedingly  important  and  is  special  to  positive  definite  matrices. 


2.  Quasidefinite  Matrices 

In  this  section,  we  shall  study  factorization  techniques  for  the  reduced  KKT 
matrix  (20.1).  The  reduced  KKT  matrix  is  an  example  of  a  quasidefinite  matrix. 
A  symmetric  matrix  is  called  quasidefinite  if  it  can  be  written  (perhaps  after  a  sym¬ 
metric  permutation)  as 

-E  A 
AT  D 


B  = 


5 
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where  E  and  D  are  positive  definite  matrices.  Quasidefinite  matrices  inherit  some 
of  the  nice  properties  of  positive  definite  matrices.  In  particular,  one  can  perform 
an  arbitrary  symmetric  permutation  of  the  rows/columns  and  still  be  able  to  form  a 
factorization  of  the  permuted  matrix. 

The  idea  is  that,  after  each  step  of  the  elimination,  the  remaining  unelimi¬ 
nated  part  of  the  matrix  is  still  quasidefinite.  To  see  why,  let’s  break  out  the  first 
row/column  of  the  matrix  and  look  at  the  first  step  of  the  elimination  process.  Break¬ 
ing  out  the  first  row/column  of  B,  we  write 


—a  —bT  fT 
-b  -C  G 
f  Gt  D 


where  a  is  a  scalar,  b  and  /  are  vectors,  and  C,  D ,  and  G  are  matrices  (of  the 
appropriate  dimensions).  One  step  of  the  elimination  process  transforms  B  into 


—a 

-bT 

S 

_ i 

-b 

(C  ^  ) 

G+bfT 

a 

_  f 

GT  +  fbT  D  +  ffT 
a  a  J 

The  uneliminated  part  is 

-(c-a£)  G  +  ^l 

y  a  I  a 

GT+fbL  D+£fl 

L  a  a  J 


Clearly,  the  lower-left  and  upper-right  blocks  are  transposes  of  each  other.  Also,  the 
upper-left  and  lower-right  blocks  are  symmetric,  since  C  and  D  are.  Therefore,  the 
whole  matrix  is  symmetric.  Theorem  20.2  tells  us  that  C  —  bbT /a  is  positive  definite 
and  D  +  ffT /a  is  positive  definite,  since  the  sum  of  a  positive  definite  matrix  and 
a  positive  semidefinite  matrix  is  positive  definite  (see  Exercise  20.2).  Therefore,  the 
uneliminated  part  is  indeed  quasidefinite. 

Of  course,  had  the  first  pivot  element  been  selected  from  the  submatrix  D  in¬ 
stead  of  E,  perhaps  the  story  would  be  different.  But  it  is  easy  to  check  that  it’s  the 
same.  Hence,  no  matter  which  diagonal  element  is  selected  to  be  the  first  pivot  ele¬ 
ment,  the  resulting  uneliminated  part  is  quasidefinite.  Now,  by  induction  it  follows 
that  every  step  of  the  elimination  process  involves  choosing  a  pivot  element  from 
the  diagonals  of  a  quasidefinite  matrix.  Since  these  diagonals  come  from  either  a 
positive  definite  submatrix  or  the  negative  of  such  a  matrix,  it  follows  that  they  are 
always  nonzero  (but  many  of  them  will  be  negative).  Therefore,  just  as  for  positive 
definite  matrices,  an  arbitrary  symmetric  permutation  of  a  quasidefinite  matrix  can 
be  factored  without  any  risk  of  encountering  a  zero  pivot  element. 
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Here’s  an  example: 


1 

(20.6)  B=  3 

4 

5 


2  3 


1 

2 


5 

1 

2 


1 


(The  blocks  are  easy  to  pick  out,  since  the  negative  diagonals  must  be  from  —E, 
whereas  the  positive  ones  are  from  D.)  Let’s  eliminate  by  picking  the  diagonals  in 
the  order  1,5,  2, 4, 3.  No  permutations  are  needed  in  preparation  for  the  first  step  of 
the  elimination.  After  this  step,  we  have 


1 

2 

3  4  5 

1 

-1 

-2  r 

2 

-2 

2 

3 

-3  1 

4 

-2 

1  6  -2 

5 

1 

2 

-2  2 

Now,  we  move  row/column  5  to  the  pivot  position,  slid 

down/over,  and  eliminate  to 

get 

1 

5 

2  3  4 

1 

-1 

1 

-2" 

5 

1 

2 

2  -2 

2 

2 

-4  2 

3 

-3  1 

4 

-2 

-2 

2  1  4 

Row/column  2  is  in  the  correct  position  for  the  third  step  of  the  elimination,  and 
therefore,  without  further  ado,  we  do  the  next  step  in  the  elimination: 


Finally,  we  interchange  rows/columns  3  and  4  and  do  the  last  elimination  step  to  get 


1 

5 

2 

4 

3 


1 

-1 

1 

-2 


5  2 

1 

2  2 
2  -4 
-2  2 


4 

-2 

-2 

2 

5 

1 


3 


1 

sr 

5  j 
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From  this  final  matrix,  it  is  easy  to  extract  the  LDLT -factorization  of  the  permuta¬ 
tion  of  B : 


1 


4 

3 


1 

5 

2 

4 

3 


1  5  2  4  3 

-1  1  -2 

1  1  2 
2  -2 

-2  2  1 

1  -3 


1  5  2  4  3 


1 

-1 

1  -1  2 

1  1 

2 

1  1  -1 

1  1 

-4 

!-§  , 

2-1  -1  1 

5 

1  1 

16 

5  _ 

1 

As  always,  the  factorization  is  computed  so  that  one  can  solve  systems  of  equations. 
Given  the  factorization,  all  that  is  required  is  a  forward  substitution,  a  backward 
substitution,  a  scaling,  and  two  permutations. 


2.1.  Instability.  We  have  shown  that  an  arbitrary  symmetric  permutation  of 
the  rows/columns  of  a  quasidefinite  matrix  can  be  factored  into  LDLT .  That  is, 
mathematically  none  of  the  pivot  elements  will  be  zero.  But  they  can  be  small,  and 
their  smallness  can  cause  troubles  not  encountered  with  positive  definite  matrices. 
To  explain,  let’s  look  at  an  example.  Consider  the  linear  programming  problem 

maximize  x\  +  x 2 
subject  to  x\  +  2x2  <  1 

2xi  +  X2  <  1 
xi,  X2  >0 


and  its  dual 

minimize  y\  +  y2 

subject  to  yi  +  2y2  >  1 

2yi  +  y2  >  l 

2/1,  2/2  >  0  . 


Drawing  pictures,  it  is  easy  to  see  that  the  optimal  solution  is 


1 

*  *  *  * 

Xi  =  x2  =  Vi  =  y2  =  3 

z{  =Z2=wl=W2=  0. 


Therefore,  as  the  optimal  solution  is  approached,  the  diagonal  elements  in  X~xZ 
and  Y~XW  approach  zero.  Therefore,  at  late  stages  in  the  path-following  method, 
one  is  faced  with  the  problem  of  factoring  the  following  reduced  KKT  matrix: 
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B 


1 

2 

3 

4 


12  3  4 

— ei  12' 

— e2  2  1 

1  2  Si 

2  1  S2 


where  ei,  e2,  5i,  and  S2  are  small  positive  numbers.  Consider  what  happens  if 
we  permute  the  rows/columns  so  that  the  original  diagonal  elements  appear  in  the 
following  order:  1,  3, 4,  2.  The  permuted  matrix  is  seen  to  be 


B 


1 

3 

4 
2 


13  4  2 

-ei  1  2 

1  <Si  2 

2  S2  1 

2  1  — e2 


After  the  first  two  steps  of  the  elimination,  we  have 


(20.7) 


1 

2 


2 

ei 


2 


(Sl+i) 

4/ei 


2 


i  _  4Ai 
(<h  +  ^) 

_£2  "  (TT^) 


Using  exact  arithmetic,  the  uneliminated  part  simplifies  to 

£  _i_  Ml  1  _  4 

^  l+ei^i  1  — E  e  i  <5 1 

1 - - -  —69 - Ml — 


Here,  the  diagonal  elements  are  small  but  not  zero  and  the  off-diagonal  elements  are 
close  to  —3.  But  on  a  computer  that  stores  numbers  with  finite  precision,  the  com¬ 
putation  comes  out  quite  differently  when  the  e^’s  and  the  are  smaller  than  the 
square  root  of  the  machine  precision  (i.e.,  about  10-8  for  double-precision  floating¬ 
point  numbers).  In  the  elimination  process,  the  parenthesized  expressions  in  (20.7) 
are  evaluated  before  the  other  operations.  Hence,  in  finite  precision  arithmetic,  these 
expressions  produce 


r  4 

S  2  H - 

ei 

and  so  (20.7)  becomes 


4 

ei 


and 


1 

ei 


1 

3 

4 
2 


1 

-ei 

1 

2 


2 

2 


5 


which  clearly  presents  a  problem  for  the  next  step  in  the  elimination  process. 
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Now  let’s  consider  what  happens  if  the  elimination  process  is  applied  directly 
to  B  without  permuting  the  rows/columns.  Sparing  the  reader  the  tedious  details, 
the  end  result  of  the  elimination  is  the  following  matrix: 


(20.8) 


-ei 

1 

2 


-C2 

2 

1 


1 

2 

Si  +  f  +  f 

£i  £2 


2 

1 

_2_  | 

£i  e2 


V  el  e2  ' 


f  +  f  ^2  +  f  ^  ~ 

£i  £2  £i  £2  5i  +  — 


-  +  ^ 

el  e2 


As  before,  in  finite  precision  arithmetic,  certain  small  numbers  get  lost: 


r  4  4 

^2  H - —  — 


and 


1 


1 


6i  +  -  =  - 


ei  ei  c\  ei 

Making  these  substitutions  in  (20.8),  we  see  that  the  final  matrix  produced  by  the 
elimination  process  using  finite  precision  arithmetic  is 


-d  1  2 

— e2  2  1 

1  2  —  +  —  —  +  —  • 

£  1  £2  £l  £2 

2  1  1  0 

L  £1  £2  J 

Just  as  before,  the  fact  that  small  numbers  got  lost  has  resulted  in  a  zero  appearing 
on  the  diagonal  where  a  small  but  nonzero  (in  this  case  positive)  number  belongs. 
However,  the  situation  is  fundamentally  different  this  time.  With  the  first  ordering, 
a  —3  remained  to  be  eliminated  under  the  zero  diagonal  element,  whereas  with  the 
second  ordering,  this  did  not  happen.  Of  course,  it  didn’t  happen  in  this  particular 
example  because  the  0  appeared  as  the  last  pivot  element,  which  has  no  elements 
below  it  to  be  eliminated.  But  that  is  not  the  general  reason  why  the  second  ordering 
does  not  produce  nonzeros  under  zero  pivot  elements.  In  general,  a  zero  (which 
should  be  a  small  positive)  can  appear  anywhere  in  the  lower-right  block  (relative 
to  the  original  quasidefinite  partitioning).  But  once  the  elimination  process  gets 
to  this  block,  the  remaining  uneliminated  part  of  the  matrix  is  positive  definite. 
Hence,  the  estimate  in  (20.5)  can  be  used  to  tell  us  that  all  the  nonzeros  below  a 
zero  diagonal  are  in  fact  small  themselves.  A  zero  appearing  on  the  diagonal  only 
presents  a  problem  if  there  are  nonzeros  below  it  that  need  to  be  eliminated.  If  there 
are  none,  then  the  elimination  can  simply  proceed  to  the  next  pivot  element  (see 
Exercise  20.1). 

Let’s  summarize  the  situation.  We  showed  in  the  last  chapter  that  the  possi¬ 
bility  of  dense  rows/columns  makes  it  unattractive  to  work  strictly  with  the  normal 
equations.  Yet,  although  the  quasidefinite  reduced  KKT  system  can  be  used,  it  is 
numerically  less  stable.  A  compromise  solution  seems  to  be  suggested.  One  could 
take  a  structured  approach  to  the  reordering  heuristic.  In  the  structured  approach, 
one  decides  first  whether  it  seems  better  to  begin  pivoting  with  elements  from  the 
upper-left  block  or  from  the  lower-right  block.  Once  this  decision  is  made,  one 
should  pivot  out  all  the  diagonal  elements  from  this  block  before  working  on  the 
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other  block,  with  the  exception  that  pivots  involving  dense  rows/columns  be  de¬ 
ferred  to  the  end  of  the  elimination  process.  If  no  dense  columns  are  identified,  this 
strategy  mimics  the  normal  equations  approach.  Indeed,  after  eliminating  all  the 
diagonal  elements  in  the  upper-left  block,  the  remaining  uneliminated  lower-right 
block  contains  exactly  the  matrix  for  the  system  of  dual  normal  equations.  Similarly, 
had  the  initial  choice  been  to  pivot  out  all  the  diagonal  elements  from  the  lower-right 
block,  then  the  remaining  uneliminated  upper-left  block  becomes  the  matrix  for  the 
system  of  primal  normal  equations. 

With  this  structured  approach,  if  no  dense  rows/columns  are  identified  and  de¬ 
ferred,  then  the  elimination  process  is  numerically  stable.  If,  on  the  other  hand, 
some  dense  rows/columns  are  deferred,  then  the  factorization  is  less  stable.  But  in 
practice,  this  approach  seems  to  work  well.  Of  course,  one  could  be  more  careful 
and  monitor  the  diagonal  elements.  If  a  diagonal  element  gets  small  (relative  to  the 
other  uneliminated  nonzeros  in  the  same  row/column),  then  one  could  flag  it  and 
then  calculate  a  new  ordering  in  which  such  pivot  elements  are  deferred  to  the  end 
of  the  elimination  process. 


3.  Problems  in  General  Form 

In  this  section,  we  describe  how  to  adapt  the  path-following  algorithm  to  solv¬ 
ing  problems  presented  in  the  following  general  form: 

maximize  cTx 

(20.9)  subject  to  a  <  Ax  <  b 

l  <  x  <  u. 

As  in  Chapter  9,  some  of  the  data  elements  are  allowed  to  take  on  infinite  values. 
However,  let  us  consider  first  the  case  where  all  the  components  of  a,  6,  /,  and  u  are 
finite.  Infinities  require  special  treatment,  which  shall  be  discussed  shortly. 

Following  the  derivation  of  the  path-following  method  that  we  introduced  in 
Chapter  18,  the  first  step  is  to  introduce  slack  variables  as  appropriate  to  replace  all 
inequality  constraints  with  simple  nonnegativity  constraints.  Hence,  we  rewrite  the 
primal  problem  (20.9)  as  follows: 

maximize  cTx 
subject  to  Ax  A  f  =  b 
—Ax  A  p  =  —a 
x  +  t  =  u 
—x  A  g  =  —l 
f,  P,  t,  g  >  0. 

In  Chapter  9,  we  showed  that  the  dual  problem  is  given  by 

minimize  bT v  —  aTq  A  uT s  —  lTh 
subject  to  AT  (v  —  q)  —  (h  —  s)  =  c 

v ,  g,  s,  h  >  0, 


3.  PROBLEMS  IN  GENERAL  FORM 


303 


and  the  corresponding  complementarity  conditions  are  given  by 

fiVi  =  0  i  =  l,2,...,m, 

Piqi  =  0  i  =  1, 2, . . . ,  m, 

tjSj  =  0  j  =  1,  2, . . . ,  n, 

Qjhj  —  0  j  =  1,  2, . . . ,  n. 

The  next  step  in  the  derivation  is  to  introduce  the  primal-dual  central  path, 
which  we  parametrize  as  usual  by  a  positive  real  parameter  //.  Indeed,  for  each 
H  >  0,  we  define  the  associated  central-path  point  in  primal-dual  space  as  the 
unique  point  that  simultaneously  satisfies  the  conditions  of  primal  feasibility,  dual 
feasibility,  and  //-complementarity.  Ignoring  nonnegativity  (which  is  enforced  sep¬ 
arately),  these  conditions  are 

Ax  +  /  =  b 
f  +  p  =  b  —  a 
x  +  t  =  u 
—x  +  g  =  —l 

ATy  +  s  —  h  =  c 
y  +  q  -  v  =  0 

FVe  =  fie 
PQe  —  fie 
TSe  =  fie 
GHe  =  fie. 

Note  that  we  have  replaced  the  primal  feasibility  condition,  —Ax  +  p  =  —a,  with 
the  equivalent  condition  that  /  +  p  =  b  —  a,  and  we  have  introduced  into  the  dual 
problem  new  variables  y  defined  by  y  =  v  —  q.  The  reason  for  these  changes  is 
to  put  the  system  of  equations  into  a  form  in  which  A  and  AT  appear  as  little  as 
possible  (so  that  solving  the  system  of  equations  for  the  step  direction  will  be  as 
efficient  as  possible). 

The  last  four  equations  are  the  //-complementarity  conditions.  As  usual,  each 
upper  case  letter  that  appears  on  the  left  in  these  equations  denotes  the  diagonal  ma¬ 
trix  having  the  components  of  the  corresponding  lower-case  vector  on  its  diagonal. 
The  system  is  a  nonlinear  system  of  5 n  +  5m  equations  in  5 n  +  5m  unknowns. 
It  has  a  unique  solution  in  the  strict  interior  of  the  following  subset  of  primal-dual 
space: 

(20.10)  {( x,f,p,t,g,y,v,q,s,h )  :  f,p,t,g,v,q,s,h  >  0}. 

This  fact  can  be  seen  by  noting  that  these  equations  are  the  first-order  optimality 
conditions  for  an  associated  strictly  convex  barrier  problem. 

As  fi  tends  to  zero,  the  central  path  converges  to  the  optimal  solution  to  both  the 
primal  and  dual  problems.  The  path-following  algorithm  is  defined  as  an  iterative 
process  that  starts  from  a  point  in  the  strict  interior  of  (20.10),  estimates  at  each 
iteration  a  value  of  fi  representing  a  point  on  the  central  path  that  is  in  some  sense 
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closer  to  the  optimal  solution  than  the  current  point,  and  then  attempts  to  step  toward 
this  central-path  point,  making  sure  that  the  new  point  remains  in  the  strict  interior 
of  the  set  given  in  (20.10). 

Suppose  for  the  moment  that  we  have  already  decided  on  the  target  value  for 
fi.  Let  (x, . . . ,  h)  denote  the  current  point  in  the  strict  interior  of  (20.10),  and  let 
(x  +  Ax , . . . ,  h  +  Ah)  denote  the  point  on  the  central  path  corresponding  to  the 
target  value  of  p.  The  defining  equations  for  the  point  on  the  central  path  can  be 
written  as 


AAx  +  A  /  =  b  —  Ax  —  f 
A  /  +  A  p  =  b  —  a  —  f  —  p 
Ax  At  =  u  —  x  —  t 


—Ax  +  A  g  =  —  l  +  x 


9 


=:  p 

=:  a 
=:  r 
=:  v 


AT  Ay  +  As  —  Ah 
Ay  +  A  q  —  Av 


c  —  ATy  —  s  +  h 

-y  -  q  +  v 


=:  (T 


P 


FV-1  Av  +  A/  =  pV~xe  -  f  -  V^AVAf 
QP~X  Ap  +  A  q  =  pP~1e  —  q  —  P~x  APAq 
ST-1  At  +  As  =  pT~1e  -s-  T~1ATAs 
HG~1Ag  +  Ah  =  pG^e  -h-  G~xAGAh 


If 

7s 
7  h, 


where  we  have  introduced  notations  p, . . . ,  7^  as  shorthands  for  the  right-hand  side 
expressions.  This  is  almost  a  linear  system  for  the  direction  vectors  (Ax, . . . ,  Ah). 
The  only  nonlinearities  appear  on  the  right-hand  sides  of  the  complementarity  equa¬ 
tions  (i.e.,  in  7/, ... ,  7^).  As  we  saw  before,  Newton’s  method  suggests  that  we 
simply  drop  these  nonlinear  terms  to  obtain  a  linear  system  for  the  ‘‘delta”  variables. 

Clearly,  the  main  computational  burden  is  to  solve  the  system  shown  above.  It 
is  important  to  note  that  this  is  a  large,  sparse,  indefinite,  linear  system.  It  is  also 
symmetric  if  one  negates  certain  rows  and  rearranges  rows  and  columns  appropri¬ 
ately: 
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-I 

Av 

'-if 

I 

I 

As 

T 

-I 

I 

Ah 

V 

I  I 

Aq 

a 

A 

I 

Ay 

P 

I 

-I  A1' 

Ax 

a 

-I 

I  I 

A  / 

13 

I 

QP-1 

Ap 

7 q 

I 

HG~l 

A  g 

7  h 

I 

t-H 

1 

h 

Co 

At 

Js 

Because  this  system  is  symmetric,  we  look  for  symmetric  pivots  to  solve  it. 
That  is,  we  choose  our  pivots  only  from  the  diagonal  elements.  It  turns  out  that  we 
can  eliminate  Av,  Ap ,  A g,  and  At  using  the  nonzero  diagonals  —  FV-1,  QP~X , 
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HG  l,  and  ST  1,  respectively,  in  any  order  without  causing  any  nondiagonal  fill- 
in.  Indeed,  the  equations  for  Ax,  A p,  A g,  and  At  are 

Ax  =  VF~1(^/f  —  A/) 

(20.11)  Ap  =  PQ~1(lq-  Aq) 

Ag  =  GH~1(lh-Ah) 

At  =  TS~1(/ys  -  As), 

and  after  elimination  from  the  system,  we  get 


-TS-1 

I 

As 

"  A  " 

T 

-GH-1 

-I 

Ah 

V 

T— 1 

1 

C? 

Of 

I 

Aq 

/\ 

a 

A  I 

Ay 

P_ 

7  ~i  W 

Ax 

(7 

I  I 

VF-1 

La/  J 

IP] 

where  again  we  have  introduced  abbreviated  notations  for  the  components  of  the 
right-hand  side: 

T  =  T  -  TS~1Js 
v  =  v-  GH~X 7ft 
a  =  a  —  PQ~l  7g 
P  =  P  +  VF-^f. 

Next  we  use  the  pivot  elements  — TS  1 ,  —GH  l,  and  —  PQ  1  to  solve  for 
As,  Ah,  and  Aq,  respectively: 

As  =  —  ST-1(f  —  Ax) 

(20.12)  Ah  =  -HG~x[y  +  Ax) 

Aq  =  —QP~1(a  —  A/). 


After  eliminating  these  variables,  the  system  simplifies  to 


where 


A  I 

Ay 

A1' 

D 

Ax 

I 

E 

L  A  /  J 

D  =  ST-1 


_P _ 

a+  ST-H  -  HG-1!) 

_  P  +  QP^a. 

+  HG~ 1 


and 

E  =  VF-1  +QP-1. 

Finally,  we  use  the  pivot  element  E  to  solve  for  A/, 
(20.13)  Af  =  E-10  +  QP~1a- Ay), 
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which  brings  us  to  the  reduced  KKT  equations: 


(20.14) 


- E -1 

A  " 

"  Ay  " 

AT 

D 

Ax 

cr  +  ST-U  -  tfG-U 

initialize  (x,  /,  p,  £,  g ,  y ,  x,  q,  5,  /i)  such  that  /,  p,  £,  p,  x,  q,  5,  h  >  0 

while  (not  optimal)  { 

p  =  b  —  Ax  —  re 

<7  =  c  —  ATp  +  z 

7  =  /Tx  +  pTp  +  £Ts  +  gTh 
z  7 

/i  =  c) - 

n  +  rn 

If  =  ^V~1e  -  f 
7g  =  l~i-P~1e  -  q 
%  =  pT~le  -  s 
7  h  =  pG~1e  —  h 
r  =  u  —  x  —  t  —  TS~1^fs 
z>  =  —  l  P  x  —  g  —  GH~1^fh 
a  =  b  —  a  —  f— p  —  PQ~1^q 

P  =  -y  -  qP  v  P  VF~l^f 
D  =  ST-1  P  HG~X 
E  =  VF-1  +QP-1 


solve: 


-E1"1  ' 

Ay 

'  p-E-qfi  +  QP-'a)  ' 

AT  D 

Ax 

a  +  ST-H  -  HG-1!) 

compute:  A /  using  (20.13),  As,  Ah,  A q  using  (20.12), 
and  Ax,  Ap,  A g.  At  using  (20.11) 


x  i —  x  H-  0Ax, 
p  p  P  0Ap, 
^  +  6>Ay, 


y  <-  y  +  OA  y, 
q  ^  q  P  OAq , 
h  h  P  0Afo 


f^fpOAf, 
t  <—  t  P  6 At, 


v  <—  x  +  0Ax 
s  s  +  0As 


Figure  20. 1 .  The  path-following  method — general  form. 
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Up  to  this  point,  none  of  the  eliminations  have  produced  any  off-diagonal  fill- 
in.  Also,  the  matrix  for  system  given  in  (20.14)  is  a  symmetric  quasidefinite  matrix. 
Hence,  the  techniques  given  in  Section  19.2  for  solving  such  systems  can  be  used. 
The  algorithm  is  summarized  in  Figure  20.1. 


Exercises 


20.1  The  matrix 


B  = 


2  -2 

1  -1 

2  2 
-1  2  -1 
-1  2 


is  not  positive  definite  but  is  positive  semidefinite.  Find  a  factorization 
B  =  LDLt ,  where  L  is  lower  triangular  with  ones  on  the  diagonal  and 
D  is  a  diagonal  matrix  with  nonnegative  diagonal  elements.  If  such  a  fac¬ 
torization  exists  for  every  symmetric  positive  semidefinite  matrix,  explain 
why.  If  not,  give  a  counterexample. 


20.2  Show  that  the  sum  of  a  positive  definite  matrix  and  a  positive  semidefinite 
matrix  is  positive  definite. 


20.3  Permute  the  rows/columns  of  the  matrix  B  given  in  (20.6)  so  that  the 
diagonal  elements  from  B  appear  in  the  order  2,  3, 4, 5, 1.  Compute  an 
LDLT  -  factorization  of  this  matrix. 


20.4 


Show  that,  if  B  is  symmetric  and  positive  semidefinite,  then 
y/babjj  for  all  ij. 


< 


Notes 

Most  implementations  of  interior-point  methods  assume  the  problem  to  be  for¬ 
mulated  with  equality  constraints.  In  this  formulation,  Lustig  et  al.  (1994)  give  a 
good  overview  of  the  performance  of  interior-point  algorithms  compared  with  the 
simplex  method. 

The  suggestion  that  it  is  better  to  solve  equations  in  the  KKT  form  instead  of 
normal  form  was  offered  independently  by  a  number  of  researchers  (Gill  et  al.  1992; 
Turner  1991;  Fourer  and  Mehrotra  1991;  Vanderbei  and  Carpenter  1993). 

The  advantages  of  the  primal-dual  symmetric  formulation  were  first  reported 
in  Vanderbei  (1994).  The  basic  properties  of  quasidefinite  matrices  were  first  given 
in  Vanderbei  (1995). 


CHAPTER  21 


The  Affine- Scaling  Method 


In  the  previous  chapter,  we  showed  that  the  step  direction  for  the  path-following 
method  can  be  decomposed  into  a  linear  combination  of  three  directions:  a  direction 
toward  optimality,  a  direction  toward  feasibility,  and  a  direction  toward  centrality.  It 
turns  out  that  these  directions,  or  minor  variants  of  them,  arise  in  all  interior-point 
methods. 

Historically,  one  of  the  first  interior-point  methods  to  be  invented,  analyzed, 
and  implemented  was  a  two-phase  method  in  which  Phase  I  uses  only  the  feasibil¬ 
ity  direction  and  Phase  II  uses  only  the  optimality  direction.  This  method  is  called 
affine  scaling.  While  it  is  no  longer  considered  the  method  of  choice  for  practi¬ 
cal  implementations,  it  remains  important  because  its  derivation  provides  valuable 
insight  into  the  nature  of  the  three  basic  directions  mentioned  above. 

In  this  chapter,  we  shall  explain  the  affine-scaling  principle  and  use  it  to  derive 
the  step  toward  optimality  and  step  toward  feasibility  directions.  As  always,  our 
main  interest  lies  in  problems  presented  in  standard  form.  But  for  affine  scaling, 
it  is  easier  to  start  by  considering  problems  in  equality  form.  Hence,  we  begin  by 
assuming  that  the  linear  programming  problem  is  given  as 

maximize  cTx 

(21.1)  subject  to  Ax  =  b 

x  >  0. 

We  shall  begin  with  the  Phase  II  algorithm.  Hence,  we  assume  that  we  have  a 
feasible  initial  starting  point,  x° .  For  the  affine-scaling  method,  it  is  important  that 
this  starting  point  lie  in  the  strict  interior  of  the  feasible  set.  That  is,  we  assume  that 

Ax°  =  b  and  x°  >  0. 

1.  The  Steepest  Ascent  Direction 

Since  the  affine- scaling  principle  is  fundamentally  geometric,  it  is  useful  to 
keep  a  picture  in  mind.  A  typical  picture  for  m  —  1  and  n  =  3  is  shown  in 
Figure  21.1.  The  ultimate  goal  is,  of  course,  to  move  from  x°  to  the  optimal  so¬ 
lution  x* .  However,  the  short-term  goal  is  to  move  from  x°  in  some  direction  Ax 
that  improves  the  objective  function.  Such  a  direction  is  called  an  ascent  direction. 
You  probably  recall  from  elementary  calculus  that  the  best,  i.e.,  steepest,  ascent 
direction  is  given  by  the  gradient  of  the  objective  function.  However,  as  we  see  in 
Figure  21.1,  there  is  no  reason  a  priori  for  the  gradient  to  “lie  in”  the  feasible  region. 
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X 


3 


level  sets  of  cT. 


x 


Figure  21.1.  A  typical  feasible  region  when  the  problem  is  in 
equality  form,  m  =  1,  and  n  =  3.  The  lines  drawn  on  the  feasible 
set  represent  level  sets  of  the  objective  function,  and  x°  represents 
the  starting  point  for  the  affine- scaling  method. 


Hence,  the  steepest  ascent  direction  will  almost  surely  cause  a  move  to  infeasible 
points.  This  is  also  clear  algebraically.  Indeed, 

A(x°  +  Ax)  =  Ax°  +  AAx  =  b  +  Ac  ^  b 

(unless  Ac  =  0  which  is  not  likely). 

To  see  how  to  find  a  better  direction,  let  us  first  review  in  what  sense  the  gradient 
is  the  steepest  ascent  direction.  The  steepest  ascent  direction  is  defined  to  be  the 
direction  that  gives  the  greatest  increase  in  the  objective  function  subject  to  the 
constraint  that  the  displacement  vector  has  unit  length.  That  is,  the  steepest  ascent 
direction  is  the  solution  to  the  following  optimization  problem: 


maximize  cT(x°  +  Ax) 
subject  to  ||Ax||2  =  1. 


(21.2) 


We  can  solve  this  problem  using  Lagrange  multipliers.  Indeed,  if  we  let  A  denote 
the  Lagrange  multiplier  for  the  constraint,  the  problem  becomes 


Ax  =  —c  oc  c. 

2A 


c  oc  c. 
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Then  differentiating  the  Lagrangian  with  respect  to  A  and  setting  that  derivative  to 
zero,  we  see  that 


|Ax||2  -1  =  0, 


which  implies  that 


|Ax||  =  ±1. 


Hence,  the  steepest  ascent  direction  points  in  the  direction  of  either  c  or  its  negative. 
Since  the  negative  is  easily  seen  not  to  be  an  ascent  direction  at  all,  it  follows  that 
the  steepest  ascent  direction  points  in  the  direction  of  c. 


2.  The  Projected  Gradient  Direction 


The  problem  with  the  steepest  ascent  direction  is  that  it  fails  to  preserve  feasi¬ 
bility.  That  is,  it  fails  to  preserve  the  equality  constraints  Ax  =  b.  To  remedy  this 
problem,  let’s  add  these  constraints  to  (21.2)  so  that  we  get  the  following  optimiza¬ 
tion  problem: 


maximize 
subject  to 


cT(x°  +  Ax) 

||Ax||2  =  1 
A(x°  +  Ax)  =  b. 


Again,  the  method  of  Lagrange  multipliers  is  the  appropriate  tool.  As  before,  let  A 
denote  the  Lagrange  multiplier  for  the  norm  constraint,  and  now  introduce  a  vector 
y  containing  the  Lagrange  multipliers  for  the  equality  constraints.  The  resulting 
unconstrained  optimization  problem  is 

max  cT(x°  +  Ax)  —  A(AxTAx  —  1)  —  yT (A(x°  +  Ax)  —  b). 

Ax,X,y 


Differentiating  this  Lagrangian  with  respect  to  Ax,  A,  and  y  and  setting  these  deriva¬ 
tives  to  zero,  we  get 

c  —  2  A  Ax  —  ATy  =  0 
||  Ax||2  -1=0 
A(x°  +  Ax)  —  b  =  0. 


The  second  equation  tells  us  that  the  length  of  Ax  is  one.  Since  we  are  interested 
in  the  direction  of  Ax  and  are  not  concerned  about  its  length,  we  ignore  this  second 
equation.  The  first  equation  tells  us  that  Ax  is  proportional  to  c  —  ATy ,  and  again, 
since  we  aren’t  concerned  about  lengths,  we  put  A  =  1/2  so  that  the  first  equation 
reduces  to 


(21.3)  A  x  =  c  —  ATy. 

Since  Ax°  =  6,  the  third  equation  says  that 

AAx  =  0. 


Substituting  (21.3)  into  this  equation,  we  get 

Ac  —  AATy  =  0, 

which,  assuming  that  AAT  has  full  rank  (as  it  should),  can  be  solved  for  y  to  get 

y  =  (AAt)~1  Ac. 
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Now,  substituting  this  expression  into  (21.3),  we  see  that 

Ax  =  c  —  At  {AAt)~1  Ac. 

It  is  convenient  to  let  P  be  the  matrix  defined  by 

P  =  I  -  AT(AAT)~1A. 

With  this  definition,  Ax  can  be  expressed  succinctly  as 

Ax  =  Pc. 

We  claim  that  P  is  the  matrix  that  maps  any  vector,  such  as  c,  to  its  orthogonal 
projection  onto  the  null  space  of  A.  To  justify  this  claim,  we  first  need  to  define  some 
of  the  terms  we’ve  used.  The  null  space  of  A  is  defined  as  {d  E  Mn  :  Ad  =  0}.  We 
shall  denote  the  null  space  of  A  by  N(A).  A  vector  c  is  the  orthogonal  projection 
of  c  onto  N (A)  if  it  lies  in  the  null  space, 

ceN(A), 

and  if  the  difference  between  it  and  c  is  orthogonal  to  every  other  vector  in  N (A). 
That  is, 

dT (c  —  c)  =  0,  for  all  d  E  N(A). 

Hence,  to  show  that  Pc  is  the  orthogonal  projection  of  c  onto  the  null  space  of  A, 
we  simply  check  these  two  conditions.  Checking  the  first,  we  see  that 

APc  —  Ac  —  AAt{AAt)~1Ac , 

which  clearly  vanishes.  To  check  the  second  condition,  let  d  be  an  arbitrary  vector 
in  the  null  space,  and  compute 

dT  (c  —  Pc)  =  dT  At  (AAt)~1  Ac, 

which  also  vanishes,  since  dT AT  =  ( Ad)T  =  0.  The  orthogonal  projection  Pc  is 
shown  in  Figure  21.1. 

3.  The  Projected  Gradient  Direction  with  Scaling 

The  orthogonal  projection  of  the  gradient  gives  a  good  step  direction  in  the 
sense  that  among  all  feasibility-preserving  directions,  it  gives  the  largest  rate  of 
increase  of  cTx  per  unit  step  length.  This  property  is  nice  as  long  as  the  current 
point  x°  is  well  inside  the  feasible  set.  But  if  it  is  close  to  a  “wall,”  the  overall 
increase  in  one  step  will  be  small,  since  even  though  the  rate  is  large  the  actual 
step  length  will  be  small,  yielding  a  small  overall  increase.  In  fact,  the  increase 
will  become  arbitrarily  small  as  the  point  x°  is  taken  closer  and  closer  to  a  “wall.” 
Hence,  to  get  a  reasonable  algorithm,  we  need  to  find  a  formula  for  computing  step 
directions  that  steers  away  from  walls  as  they  get  close. 

The  affine-scaling  algorithm  achieves  this  affect  as  follows:  scale  the  variables 
in  the  problem  so  that  the  current  feasible  solution  is  far  from  the  walls,  compute  the 
step  direction  as  the  projected  gradient  in  the  scaled  problem,  and  then  translate  this 
direction  back  into  the  original  system.  The  idea  of  scaling  seems  too  simple  to  do 
any  good,  and  this  is  true  if  one  tries  the  most  naive  scaling — just  multiplying  every 
variable  by  one  large  number  (such  as  the  reciprocal  of  the  smallest  component  of 
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^3 


Figure  21.2.  The  effect  of  affine  scaling  on  projected  gradients. 


x°).  Such  a  uniform  scaling  does  not  change  the  picture  in  any  way.  For  example, 
Figure  21.1,  which  doesn’t  show  specific  scales  on  the  coordinate  axes,  would  not 
change  at  all.  Hence,  whether  distance  is  measured  in  miles  or  in  feet,  the  property 
of  being  close  to  a  wall  remains  unchanged. 

Fortunately,  the  scaling  we  have  in  mind  for  the  affine- scaling  algorithm  is  just 
slightly  fancier.  Indeed,  the  idea  is  to  scale  each  variable  in  such  a  manner  that  its 
initial  value  gets  mapped  to  1.  That  is,  for  each  j  =  1,  2, . . . ,  n,  we  introduce  new 
variables  given  by 


*3 

Of  course,  this  change  of  variable  is  trivial  to  undo: 


In  matrix  notation,  the  change  of  variables  can  be  written  as 
(21.4)  re  =  X°£. 

Note  that  we  are  employing  our  usual  convention  of  letting  an  upper-case  letter  stand 
for  a  diagonal  matrix  whose  diagonal  elements  come  from  the  components  of  the 
vector  denoted  by  the  corresponding  lower-case  letter.  Clearly,  under  this  change  of 
variables,  the  initial  solution  x°  gets  mapped  to  the  vector  e  of  all  ones,  which  is  at 
least  one  unit  away  from  each  wall.  Figure  21.2  shows  an  example  of  this  scaling 
transformation.  Note  that,  unlike  the  trivial  scaling  mentioned  above,  this  scaling 
changes  the  way  the  level  sets  of  the  objective  cut  across  the  feasible  region. 
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Making  the  change  of  variables  given  by  (21.4)  in  (21.1),  we  find  that  the  prob¬ 
lem  in  the  scaled  space  is 

maximize  cT  X 
subject  to  AX °£  =  b 

£>0. 

Clearly,  it  is  a  linear  programming  problem  in  standard  form  with  constraint  matrix 
AX°  and  vector  of  objective  function  coefficients  (cTX°)T  =  X°c.  Letting  A£ 
denote  the  projected  gradient  of  the  objective  function  in  this  scaled  problem,  we 
see  that 

A  £  =  (/  -  X°AT{AX°2AT)-lAX0\  X°c. 

Ignore  step  length  worries  for  the  moment,  and  consider  moving  from  the  current 
solution  =  e  to  the  following  new  point  in  the  scaled  problem: 

c1  =  e°  +  a£. 

Then  transforming  this  new  point  back  into  the  original  unsealed  variables,  we  get 
a  new  point  x 1  given  by 

x1  =  X°C  =X°(e  +  A£)  =x°  +  X°A£. 

Of  course,  the  difference  between  x1  and  x"  is  the  step  direction  in  the  original 
variables.  Denoting  this  difference  by  Ax,  we  see  that 

Ax  =  X°  (i  -  X°At(AX°2At)-1AX °)  X°c 
(21.5)  =  (D  -  DAt{ADAt)~1AD)  c, 

where 

D  =  X°2. 

The  expression  for  Ax  given  by  (21.5)  is  called  the  affine-scaling  step  direction . 
Of  course,  to  construct  the  affine-scaling  algorithm  out  of  this  formula  for  a  step 
direction,  one  simply  needs  to  choose  step  lengths  in  such  a  manner  as  to  ensure 
“strict”  feasibility  of  each  iterate. 

We  end  this  section  by  illustrating  some  of  the  calculations  on  the  following 
trivial  example: 

maximize  2x\  +  3x2  +  2x3 
subject  to  x\  +  X2  +  2x3  =  3 

Xi,  x2,  x3  >  0  . 

This  is  precisely  the  problem  shown  in  Figure  21.2.  As  in  the  figure,  let  us  assume 
that 

r  1 1 
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For  later  comparison,  we  compute  the  projected  gradient  (without  scaling): 

Pc  =  c-At(AAt)-1Ac 
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Now,  in  the  scaled  coordinate  system,  the  gradient  of  the  objective  function  is 
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and  the  constraint  matrix  is  given  by 


AX° 


1  1  2 


Using  these,  we  compute  as  follows: 
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Finally,  Ax  is  obtained  by  the  inverse  scaling: 

Ax  =  X°A£ 
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7 

9 

14 

11 

L  14  J 

1 

2 


3 

2 


1  ' 
2  . 


1 


to| i — 1  to|co  to 
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In  the  next  section,  we  discuss  the  convergence  properties  of  the  affine-scaling  al¬ 
gorithm. 


4.  Convergence 

In  the  previous  section,  we  derived  the  affine-scaling  step  direction  Ax.  To 
make  an  algorithm,  we  need  to  introduce  an  associated  step  length.  If  the  step 
were  chosen  so  that  the  new  point  were  to  lie  exactly  on  the  boundary  of  the  fea¬ 
sible  region,  then  the  multiplier  for  Ax ,  which  as  usual  we  denote  by  9 ,  would  be 
given  by 


But  as  always,  we  need  to  shorten  the  step  by  introducing  a  parameter  0  <  r  <  1 
and  setting 


With  this  choice  of  0 ,  the  iterations  of  the  affine-scaling  algorithm  are  defined  by 

x  x  +  6 Ax. 

It  turns  out  that  the  analysis  of  the  affine-scaling  algorithm  is  more  delicate 
than  the  analysis  of  the  path-following  algorithm  that  we  discussed  in  Chapter  18. 
Hence,  we  simply  state  without  proof  the  main  results. 

Theorem  21.1. 

(a)  If  the  problem  and  its  dual  are  nondegenerate,  then  for  every  r  <  1,  the 
sequence  generated  by  the  algorithm  converges  to  the  optimal  solution. 

(b)  For  r  <  2/3,  the  sequence  generated  by  the  algorithm  converges  to  an 
optimal  solution  (regardless  of  degeneracy). 

(c)  There  exists  an  example  and  an  associated  r  <  1  for  which  the  algorithm 
converges  to  a  nonoptimal  solution. 

There  is  only  one  example  currently  known  for  which  the  affine-scaling  algo¬ 
rithm  fails  by  converging  to  a  nonoptimal  solution.  For  this  example,  the  failure 
occurs  only  for  all  r  >  0.995.  It  is  not  known  whether  there  are  examples  of  the 
algorithm  failing  for  all  r  >  2/3,  although  such  a  worst-case  example  seems  likely 
to  exist. 

Convergence  is  only  the  first  question.  Once  convergence  is  established,  the 
follow-up  question  is:  how  fast?  For  example,  given  a  fixed  tolerance,  does  the 
affine-scaling  algorithm  produce  a  solution  within  this  tolerance  of  optimality  in 
a  number  of  iterations  that  is  bounded  by  a  polynomial  in  n?  Some  variants  of 
the  path-following  method  have  this  desirable  property,  so  one  would  hope  that  the 
affine- scaling  method  would  share  it.  Unfortunately,  while  no  one  has  written  down 
a  detailed  example  yet,  there  is  strong  evidence  that  the  affine- scaling  method  does 
not  have  this  property. 

To  explain  the  evidence,  consider  letting  the  step  lengths  in  the  affine- scaling 
algorithm  be  extremely  short,  even  infinitesimally  short.  In  this  case,  the  algorithm 
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max 


Figure  21.3.  A  few  continuous  paths  of  the  affine-scaling  algo¬ 
rithm.  At  every  point,  the  continuous  path  is  tangent  to  the  step 
direction  Ax. 


no  longer  generates  a  sequence  of  points  moving  toward  the  optimal  solution  but 
rather  makes  a  smooth  curve  connecting  the  starting  point  to  the  optimal  solution. 
If  we  let  the  starting  point  vary,  then  we  get  a  family  of  curves  filling  out  the  entire 
interior  of  the  feasible  region  and  connecting  each  interior  point  to  the  optimal  solu¬ 
tion.  Figure  21.3  shows  an  example  of  a  feasible  region  and  some  of  the  continuous 
paths.  Studying  the  continuous  paths  gives  information  about  the  discrete  step  algo¬ 
rithm,  since,  for  each  point  x,  the  step  direction  Ax  at  x  is  tangent  to  the  continuous 
path  through  x.  The  important  property  that  the  continuous  paths  illustrate  is  that 
as  one  gets  close  to  a  face  of  the  feasible  polytope,  the  continuous  path  becomes 
tangent  to  the  face  (see  Exercise  21.1  for  an  algebraic  verification  of  this  statement). 
This  tangency  holds  for  faces  of  all  dimensions.  In  particular,  it  is  true  for  edges. 
Hence,  if  one  starts  close  to  an  edge,  then  one  gets  a  step  that  looks  a  lot  like  a 
step  of  the  simplex  method.  Therefore,  it  is  felt  that  if  one  were  to  take  a  problem 
that  is  bad  for  the  simplex  method,  such  as  the  Klee-Minty  problem,  and  start  the 
affine-scaling  algorithm  in  just  the  right  place,  then  it  would  mimic  the  steps  of  the 
simplex  method  and  therefore  take  2n  iterations  to  get  close  to  the  optimal  solution. 
This  is  the  idea,  but  as  noted  above,  no  one  has  carried  out  the  calculations. 


5.  Feasibility  Direction 

To  derive  a  Phase  I  procedure  for  the  affine-scaling  algorithm,  we  consider 
a  starting  point  x°  that  has  strictly  positive  components  but  does  not  necessarily 
satisfy  the  equality  constraints  Ax  =  b.  We  then  let 

p  =  b  —  Ax° 

denote  the  vector  of  infeasibilities.  With  these  definitions  under  our  belt,  we  intro¬ 
duce  the  following  auxiliary  problem  involving  one  extra  variable,  which  we  shall 
denote  by  xq  (not  to  be  confused  with  the  initial  solution  vector  x°): 
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maximize  —xq 
subject  to  Ax  +  xq p  =  b 

X  >  0,  Xq  >  0. 


Clearly,  the  vector 


x° 

1 


is  a  strictly  positive  feasible  solution  to  the  auxiliary  problem.  Hence,  we  can  apply 
the  affine-scaling  algorithm  to  the  auxiliary  problem.  If  the  optimal  solution  has 
x^  >  0,  then  the  original  problem  is  infeasible.  If,  on  the  other  hand,  the  optimal 
solution  has  Xq  =  0,  then  the  optimal  solution  to  the  auxiliary  problem  provides 
a  feasible  starting  solution  to  the  original  problem  (it  may  not  be  a  strictly  interior 
feasible  solution,  but  we  shall  ignore  such  technicalities  in  the  present  discussion). 

Let  us  now  derive  a  specific  formula  for  the  step  direction  vector  in  the  auxiliary 
problem.  The  vector  of  objective  coefficients  is 


0 

-1  ’ 


the  constraint  matrix  is 

A  p], 

and  the  “current”  solution  can  be  denoted  as 


x 

Xq 


Substituting  these  three  objects  appropriately  into  (21.5),  we  get 


Ax 

( 

'  X2 

\  X 2 

"  AT  " 

=  l 

,r.2 

Xq 

CNO 

_ 1 

T 

P 

A  p 


A  p 


x 


o 


AT 

T 

P 


X 


0 


0 

1 


-l 


Exploiting  heavily  the  fact  that  all  the  coefficients  in  the  objective  vector  are  zero 
except  for  the  last,  we  can  write  a  fairly  simple  expression  for  Ax: 


Ax  =  X2At  ( AX2At  +  Xq ppT)  ^  px o 


2  \T 


T' 


-1 


The  final  simplification  comes  from  applying  the  Sherman-Morrison- Woodbury 
formula  (see  Exercise  19.1)  to  the  inverted  expression  above  to  discover  that  the 
vector  {AX2AT  +  x^ppT)~l p  points  in  the  same  direction  as  (AX2AT)~1p.  That 
is,  there  is  a  positive  scalar  a  such  that 

(. AX2At  +  Xq  ppT)~x  p  =  a(AX2  AT)_1  p 
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(see  Exercise  21.3).  Since  vector  lengths  are  irrelevant,  we  can  define  the  affine- 
scaling  feasibility  step  direction  as 

(21.6)  Ax  =  X2AT  (AX2AT)~1  p. 


6.  Problems  in  Standard  Form 

We  return  now  to  problems  in  standard  form: 

maximize  cTx 
subject  to  Ax  <  b 

x  >  0. 


Introducing  slack  variables  w,  we  can  write  the  problem  equivalently  as 


(21.7) 


maximize 
subject  to 


0 

A  I 


x 

w 

X 

w 

X 

w 


=  b 
>  0. 


Writing  down  the  affine-scaling  step  direction  for  this  problem,  we  get 


Ax 

_  (\ 

Aw 

-{[ 

W: 


x: 


w: 


'  AT  ' 

I 

(\  A  I  ] 

■  X2 

■  AT  " 

VL  J 

w 2 

I 

'  A  I  ] 

"  X2 

c 

w 2 

) 

0 

which  simplifies  to 

(AX2At  +  W2)~1AX2c. 

Therefore,  in  particular 

Ax  =  X2c  -  X2At(AX2At  +  VF2)“MX2c. 


Ax 

"  X2c  " 

X2AT 

Aw 

0 

w2 

Note  that  this  formula  for  Ax  matches  the  formula  for  Ax0PT  given  in  Section  19.3, 
except  that  the  diagonal  matrix  X2  replaces  XZ~x  and  W2  replaces  WY~X .  These 
diagonal  matrices  are  referred  to  as  scaling  matrices.  Hence,  the  formula  for  Ax 
given  above  is  often  called  the  affine -scaling  step  direction  with  primal  scaling , 
whereas  Ax0pt  is  referred  to  as  the  affine -scaling  step  direction  with  primal-dual 
scaling. 

Similar  connections  can  be  established  between  the  Phase  I  step  direction  de¬ 
rived  in  this  section  and  Axfeas  from  Section  19.3.  Indeed,  from  (21.6),  we  see  that 
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the  feasibility  step  direction  for  (21.7)  is 


Ax 

■  X2 

"  AT 

Aw 

w2  _ 

I 

1 

b-[  A  I 

X 

w 

A  I 


X‘ 


w- ■ 


AT 

I 


-1 


Again,  looking  just  at  the  formula  for  Ax,  we  see  that 


Ax  =  X2AT(AX2AT  +  W2)~\b  -Ax-  w), 

which  coincides  with  Axfeas  except  that  X2  replaces  XZ~x  and  W2 
replaces  WY~l . 


Exercises 


21.1  Step  direction  becomes  tangent  to  each  facet.  Let  Ax  denote  the  affine- 
scaling  step  direction  given  by 

Ax  =  ( X 2  -  X2 At (AX2 At)~1  AX2)  c. 

This  step  direction  is  clearly  a  function  of  x.  Fix  j.  Show  that  the  limit  as 
Xj  tends  to  zero  of  Ax  is  a  vector  whose  jth  component  vanishes.  That  is, 


lim  A  Xj  =  0. 

x  j  — ^0 


21.2  Dual  Estimates.  Consider  the  following  function,  defined  in  the  interior 
of  the  poly  tope  of  feasible  solutions  {x  :  Ax  =  6,  x  >  0}  by 

y(x)  =  (AX2At)~1AX2c. 

Consider  a  partition  of  the  columns  of  A  =  [  B  N  into  a  basic  part  B 
and  a  nonbasic  part  N ,  and,  as  we  did  in  our  study  of  the  simplex  method, 
partition  the  n-vectors  analogously.  Show  that 

lim  y(x )  =  ( Bt)~1cb ■ 

xj\f — >-0 


21.3  Let  A  be  an  m  x  n  matrix  having  rank  m,  and  let  p  be  an  arbitrary  m- 
vector.  Use  the  identity  proved  in  Exercise  19.1  to  show  that  there  exists 
a  scalar  a  such  that 


(AAt  +  ppT)_1p  =  a(AAT)-1p. 

Hint:  Be  mindful  of  which  matrices  are  invertible. 

21.4  (So-called)  Dual  Affine-Scaling  Method.  Compute  the  affine-scaling  step- 
direction  vector  Ax  for  problems  in  the  following  form: 

maximize  cTx 
subject  to  Ax  <  b. 


NOTES 
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Notes 

The  affine-scaling  algorithm  was  first  suggested  by  Dikin  (1967).  He  sub¬ 
sequently  published  a  convergence  analysis  in  Dikin  (1974).  Dikin’s  work  went 
largely  unnoticed  for  many  years  until  several  researchers  independently  rediscov¬ 
ered  the  affine- scaling  algorithm  as  a  simple  variant  of  Karmarkar’s  algorithm  (Kar- 
markar  1984).  Of  these  independent  rediscoveries,  only  two  papers  offered  a  con¬ 
vergence  analysis:  one  by  Barnes  (1986)  and  the  other  by  Vanderbei  et  al.  (1986).  It 
is  interesting  to  note  that  Karmarkar  himself  was  one  of  the  independent  rediscov¬ 
erers,  but  he  mistakenly  believed  that  the  algorithm  enjoyed  the  same  convergence 
properties  as  his  algorithm  (i.e.,  that  it  would  get  within  any  fixed  tolerance  of  opti¬ 
mality  within  a  specific  number  of  iterations  bounded  by  a  polynomial  in  n). 

Theorem  21.1(a)  was  proved  by  Vanderbei  et  al.  (1986).  Part  (b)  of  the  Theorem 
was  proved  by  Tsuchiya  and  Muramatsu  (1992)  who  also  show  that  the  result  is 
sharp.  A  sharper  sharpness  result  can  be  found  in  Hall  and  Vanderbei  (1993).  Part 
(c)  of  the  Theorem  was  established  by  Mascarenhas  (1997). 

The  first  derivation  of  the  affine-scaling  feasibility  step  direction  was  given  by 
Vanderbei  (1989).  The  simple  derivation  given  in  Section  21.5  is  due  to  M.  Meketon. 

A  recent  book  by  Saigal  (1995)  contains  an  extensive  treatment  of  affine-scaling 
methods. 


CHAPTER  22 


The  Homogeneous  Self-Dual  Method 


In  Chapter  18,  we  described  and  analyzed  an  interior-point  method  called  the 
path-following  algorithm.  This  algorithm  is  essentially  what  one  implements  in 
practice  but  as  we  saw  in  the  section  on  convergence  analysis,  it  is  not  easy  (and 
perhaps  not  possible)  to  give  a  complete  proof  that  the  method  converges  to  an 
optimal  solution.  If  convergence  were  completely  established,  the  question  would 
still  remain  as  to  how  fast  is  the  convergence.  In  this  chapter,  we  shall  present  a 
similar  algorithm  for  which  a  complete  convergence  analysis  can  be  given. 


1.  From  Standard  Form  to  Self-Dual  Form 


As  always,  we  are  interested  in  a  linear  programming  problem  given  in  standard 
form 

maximize  cTx 


(22.1) 


subject  to  Ax  <  b 

x  >  0 


and  its  dual 
(22.2) 


minimize 
subject  to 


bT  y 

ATy  >  c 

y>  o. 


As  we  shall  show,  these  two  problems  can  be  solved  by  solving  the  following 
problem,  which  essentially  combines  the  primal  and  dual  problems  into  one  problem: 


maximize  0 

subject  to  —  ATy  +  cf  <  0, 

(22.3)  Ax  -  b(j)  <  0, 

—cTx  +  bT y  <  0, 

x,  y,  <t>  >  0. 


Note  that,  beyond  combining  the  primal  and  dual  into  one  big  problem,  one  new 
variable  (</>)  and  one  new  constraint  have  been  added.  Hence,  the  total  number  of 
variables  in  (22.3)  is  n  +  m  +  1  and  the  total  number  of  constraints  is  n  +  m  +  1. 
Furthermore,  the  objective  function  and  the  right-hand  sides  all  vanish.  Problems 
with  such  right-hand  sides  are  called  homogeneous.  Also,  the  constraint  matrix 
for  problem  (22.3)  is  skew  symmetric.  That  is,  it  is  equal  to  the  negative  of  its 
transpose.  Homogeneous  linear  programming  problems  having  a  skew  symmetric 
constraint  matrix  are  called  self-dual. 


R .J.  Vanderbei,  Linear  Programming ,  International  Series  in  Operations  Research 
&  Management  Science  196,  DOI  10.1007/978-l-4614-7630-6_22, 

©  Springer  Science+Business  Media  New  York  2014 


323 


324 


22.  THE  HOMOGENEOUS  SELF-DUAL  METHOD 


In  the  next  section,  we  shall  give  an  algorithm  for  the  solution  of  homogeneous 
self-dual  linear  programming  problems.  But  first,  let’s  note  that  a  solution  to  (22.3) 
in  which  0  >  0  can  be  converted  into  solutions  for  (22.1)  and  (22.2).  Indeed,  let 
(x,  y ,  0)  be  an  optimal  solution  to  problem  (22.3).  Suppose  that  0  >  0.  (The  algo¬ 
rithm  given  in  the  next  section  will  guarantee  that  this  property  is  satisfied  whenever 
(22.1)  and  (22.2)  have  optimal  solutions.1)  Put 

x*=x/<fi  and  y*=y/f. 

Then  the  constraints  in  (22.3)  say  that 

-  ATy*  +  c  <  0, 

Ax *  —  b  <  0, 

—cTx*  +  bT y*  <  0. 

Also,  x *  and  y *  are  both  nonnegative.  Therefore,  x*  is  feasible  for  (22.1)  and  y*  is 
feasible  for  (22.2).  From  the  weak  duality  theorem  together  with  the  third  inequality 
above,  we  get 

c  x  —by. 

Therefore,  x*  is  optimal  for  the  primal  problem  (22.1)  and  y*  is  optimal  for  the 
dual  problem  (22.2).  As  we  will  see  later,  the  case  where  0  =  0  corresponds  to 
infeasibility  of  either  the  primal  or  the  dual  problem  (or  both). 


2.  Homogeneous  Self-Dual  Problems 


Consider  a  linear  programming  problem  in  standard  form 

maximize  cTx 
subject  to  Ax  <  b 

x  >  0 


and  its  dual 

minimize  bTy 

m 

subject  to  A  y>  c 

V  >  o. 

Such  a  linear  programming  problem  is  called  self- dual  if  m  =  n,  A  =  —  AT ,  and 
b  =  —c.  The  reason  for  the  name  is  that  the  dual  of  such  a  problem  is  the  same  as 
the  primal.  To  see  this,  rewrite  the  constraints  as  less-thans  and  then  use  the  defining 
properties  for  self-duality  to  get 

ATy  >  c  ~ATy  <  —  c  4=>  Ay  <  b. 

Similarly,  writing  the  objective  function  as  a  maximization,  we  get 

r~T 1  rT~]  rT~] 

min b  y  =  —max—  b  y  =  —  maxc  y. 

Hence,  ignoring  the  (irrelevant)  fact  that  the  dual  records  the  negative  of  the  objec¬ 
tive  function,  the  primal  and  the  dual  are  seen  to  be  the  same.  A  linear  programming 
problem  in  which  the  right-hand  side  vanishes  is  called  a  homogeneous  problem. 


Hhe  astute  reader  might  notice  that  setting  all  variables  to  0  produces  an  optimal  solution. 
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It  follows  that  if  a  problem  is  homogeneous  and  self-dual,  then  its  objective  func¬ 
tion  must  vanish  too. 

For  the  remainder  of  this  section,  we  assume  that  the  problem  under  consider¬ 
ation  is  homogeneous  and  self-dual.  Since  the  case  m  =  n  =  1  is  trivial  (A  =  0 
in  this  case),  we  assume  throughout  this  section  that  n  >  2.  Also,  since  the  dual  is 
the  same  problem  as  the  primal,  we  prefer  to  use  the  letter  z  for  the  primal  slacks 
(instead  of  the  usual  w).  Hence,  the  primal  can  be  written  as 

maximize  0 

(22.4)  subject  to  Ax  +  z  =  0 

x,z>  0. 

The  following  theorem  establishes  some  of  the  important  properties  of  homo¬ 
geneous  self-dual  problems. 

Theorem  22 A.  For  homogeneous  self-dual  problem  (22.4),  the  following 
statements  hold: 

(1)  It  has  feasible  solutions  and  every  feasible  solution  is  optimal 

(2)  The  set  of  feasible  solutions  has  empty  interior.  In  fact,  if(x ,  z)  is  feasible, 
then  zT x  =  0. 

PROOF.  (1)  The  trivial  solution,  (x,  z)  =  (0,  0),  is  feasible.  Since  the  objective 
function  is  zero,  every  feasible  solution  is  optimal. 

(2)  Suppose  that  (x,  z)  is  feasible  for  (22.4).  The  fact  that  A  is  skew  symmetric 
implies  that  £ T  A £  =  0  for  every  vector  £  (see  Exercise  16.1).  In  particular,  x  A.x  — 
0.  Therefore,  multiplying  Ax  +  z  =  0  on  the  left  by  xT ,  we  get  0  —  x  z la?  I  x  z  — 
xTz.  This  completes  the  proof.  □ 

Part  (2)  of  the  previous  Theorem  tells  us  that  homogeneous  self-dual  problems 
do  not  have  central  paths. 


2.1.  Step  Directions.  As  usual,  the  interior-point  method  we  shall  derive  will 
have  the  property  that  the  intermediate  solutions  it  produces  will  be  infeasible. 
Hence,  let 

p(x,  z)  =  Ax  +  z 

denote  the  infeasibility  of  a  solution  (x,  z).  Also,  let 


p(x,  z) 


1 


X 


T 


n 


z. 


The  number  p(x,z)  measures  the  degree  of  noncomplementarity  between  x  and  z. 
When  x  and  z  are  clear  from  context,  we  shall  simply  write  p  for  p{x,  z)  and  p  for 

p(x,  z). 

Step  directions  (Ax,  A z)  are  chosen  to  reduce  the  infeasibility  and  noncomple¬ 
mentarity  of  the  current  solution  by  a  given  factor  5,  0  <  5  <  1.  Hence,  we  consider 
the  nonlinear  system  that  would  make  the  infeasibility  and  noncomplementarity  of 

(x  +  Ax,  z  +  A z)  be  5  times  that  of  (x,  z): 
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A(x  +  Ax)  +  (z  +  A z)  =  8 {Ax  -f  z), 

(X  +  A X)(Z  +  A Z)e  =  8p(x,  z)e. 

As  usual,  this  system  is  nonlinear  in  the  “delta”  variables.  Dropping  the  nonlinear 
term  (appearing  only  in  the  second  equation),  we  get  the  following  linear  system  of 
equations  for  the  step  directions: 

(22.5)  AAx  +  Az  =  —  (1  —  S)p(x1  z), 

(22.6)  ZAx -h  XAz  =  Sp(x,  z)e  —  XZe. 

With  these  step  directions,  we  pick  a  step  length  0  and  step  to  a  new  point: 

x  =  x  +  0Ax,  2  =  z  +  OAz. 

We  denote  the  new  p- vector  by  p  and  the  new  p- value  by  p: 

p  =  p(x,z)  and  p  =  p(x,z). 

The  following  theorem  establishes  some  of  the  properties  of  these  step  directions. 

THEOREM  22.2.  The  following  relations  hold: 

(1)  AzT Ax  =  0. 

(2)  p  =  (1  -0  +  06)p. 

(3)  p_  =  (1  -9  +  08)p. 

(4)  XZe  -  pe  =  (1  -  6){XZe  -  pe)  +  02AXAZe. 

Proof.  (1)  We  start  by  multiplying  both  sides  of  (22.5)  on  the  left  by  AxT : 

(22.7)  AxT  AAx  +  AxT  A  z  =  —(1  —  S)AxT  p. 

The  skew  symmetry  of  A  (i.e.,  A  =  —AT)  implies  that  AxT AAx  =  0  (see  Exer¬ 
cise  16.1).  Hence,  the  left-hand  side  of  (22.7)  simplifies  nicely: 

AxT  AAx  +  AxT  A  z  =  AxT  A  z. 

Substituting  the  definition  of  p  into  the  right-hand  side  of  (22.7),  we  get 

—  (1  —  S)AxT  p  =  —(1  —  S)AxT  (Ax  +  z). 

Next,  we  use  the  skew  symmetry  of  A  to  rewrite  A xT  Ax  as  follows: 

A  xT  Ax  =  (Ax)T  Ax  =  xT  AT  Ax  =  —  xT  AAx. 

Assembling  what  we  have  so  far,  we  see  that 

(22.8)  AxT  A  z  =  —(1  —  S)(—xT  AAx  +  zT  Ax). 

To  proceed,  we  use  (22.5)  to  replace  AAx  with  —  (1  —  8) p  —  A z.  Therefore, 

— xT AAx  +  zT Ax  =  xT  ((1  —  S)p  +  A z)  4-  zT Ax 

(22.9)  =  (1  —  S)xT  p  +  xT  Az  +  zT  Ax. 

Again  using  the  definition  of  p  and  the  skew  symmetry  of  A,  we  see  that 

rr~\  rri  rri 

x  p  —  x  (Ax  +  z)  =  x  z. 
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The  last  two  terms  in  (22.9)  can  be  simplified  by  multiplying  both  sides  of  (22.6)  on 
the  left  by  eT  and  then  using  the  definition  of  p  to  see  that 

A  #  1  A  f  T  A  I  I  .  .  A  #  I 

z  Ax  +  x  Az  =  (5/in  —  x  z  =  (5  —  l)x  z. 

Making  these  substitutions  in  (22.9),  we  get 

—xT  AAx  +  zT  Ax  =  (1  —  S)xT  z  +  (5  —  l)xTz  =  0. 

Hence,  from  (22.8),  we  see  that  AxT  Az  vanishes  as  claimed. 

(2)  From  the  definitions  of  x  and  z,  we  see  that 

p  =  A(x  +  6  Ax)  +  (z  +  0Az) 

=  Ax  +  z  +  0{AAx  +  Az) 

=  (1  -6  +  65)p. 

(3)  From  the  definitions  of  x  and  z,  we  see  that 

xTz  =  (x  +  0Ax)T  (z  +  6Az) 

=  xTz  +  0(zT  Ax  +  xT  Az)  +  02AzT  Ax. 

From  part  (1)  and  (22.6),  we  then  get 

xTz  =  xTz  +  6(6pn  —  xT  z). 


Therefore, 


1 


p 


xT  z  =  (1  —  0)p  +  05/x. 


n 


(4)  From  the  definitions  of  x  and  z  together  with  part  (3),  we  see  that 


XZe  —  pe  =  (X  +  0AX)(Z  +  6AZ)e  -  (1  -  0  +  0S)pe 

=  XZe  +  0(ZAx  +  IAz)  +  02AXAZe  -{1-0  +  0S)pe. 

Substituting  (22.6)  into  the  second  term  on  the  right  and  recollecting  terms,  we  get 
the  desired  expression.  □ 


2.2.  Predictor-Corrector  Algorithm.  With  the  preliminaries  behind  us,  we 
are  now  ready  to  describe  an  algorithm.  We  shall  be  more  conservative  than  we  were 
in  Chapter  18  and  define  the  algorithm  in  such  a  way  that  it  keeps  the  components 
of  XZe  close  to  each  other.  Indeed,  for  each  0  <  f3  <  1,  let 


A f{(3)  =  {(x,  z)  >  0  :  || XZe  —  /i(x,  z)e||  <  /3p(x,  z)} 


Shortly,  we  will  only  deal  with  Af{l/A)  and  J\f(l/2)  but  first  let  us  note  generally 
that  (3  <  /3'  implies  that  A f(/3)  C  Af{f3').  Hence,  as  a  function  of  /?,  the  Af{/3)’s 
form  an  increasing  family  of  sets.  Also,  Af(0)  is  precisely  the  set  of  points  (x,  z) 
for  which  XZe  has  all  equal  components. 

The  algorithm  alternates  between  two  types  of  steps.  On  the  first  iteration 
and  subsequently  on  every  other  iteration,  the  algorithm  performs  a  predictor  step. 
Before  a  predictor  step,  one  assumes  that 


(x,  z)  g  Af{l/A). 
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Then  step  directions  are  computed  using  8  =  0  (i.e.,  with  no  centering)  and  the  step 
length  is  calculated  so  as  not  to  go  outside  of  AT (1/2): 

(22.10)  0  =  ma x{t  :  (x  +  tAx,  z  +  tAz)  E  Af(l/2)}. 

On  the  even  iterations,  the  algorithm  performs  a  corrector  step.  Before  a  corrector 
step,  one  assumes  that 


(x,  z)  E  Af(l/2) 

(as  is  guaranteed  by  the  predictor  step’s  step  length).  Then  step  directions  are  com¬ 
puted  using  (5  =  1  (i.e.,  pure  centering)  and  the  step  length  parameter  0  is  set  to  1. 

The  following  theorem  shows  that  the  result  of  each  step  satisfies  the  precondi¬ 
tion  for  the  next  step  of  the  algorithm  and  that  p  decreases  on  predictor  steps  while 
it  stays  the  same  on  corrector  steps. 

Theorem  22.3.  The  following  statements  are  true: 

(1)  After  a  predictor  step,  (x,  z)  E  Af(l/2)  and  p  =  (1  —  0)p. 

(2)  After  a  corrector  step,  (x,  z)  E  AT (1/4)  and  p  =  p. 

Proof  of  Part  (1).  The  formula  for  p  follows  from  part  (3)  of  Theorem  22.2 
by  putting  (5  =  0.  The  fact  that  (x,  z)  E  Af(  1/2)  is  an  immediate  consequence  of 
the  choice  of  6.  □ 


Before  proving  Part  (2)  of  the  Theorem,  we  need  to  introduce  some  notation 
and  prove  a  few  technical  results.  Let 


p  =  X~1/2Z1/2Ax, 

q  =  X1/2Z~1/2Az, 
r  =  p  +  q 

=  X~1/2Z~1/2(ZAx  +  XAz ) 
(22.11)  =  X-1/2Z~1/2(Siie-  XZe). 


The  technical  results  are  summarized  in  the  following  lemma. 
Lemma  22.4.  The  following  statements  are  true: 


2  _ 


(1)  ||PQe||  <  \  \r 

(2)  If  8  =  0,  then  ||r||~  =  np. 

(3)  If  8  =  1  and  (x,  z)  E  Af(/3),  then  \\r\\2  <  P2p/(  1  —  /3). 

Proof.  (1)  First  note  that  pTq  =  AxT  A z  =  0  by  Theorem  22.2(1).  Hence, 


r 


I P  +  q\\2  =  PTP  +  2  pTq  +  qTq  =  ^(Pj  +  q ])■ 
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Therefore, 


rll4  = 

>  E(u2  +  q2)2 

3 

=  E  ((u2  -  42)2 + 4u2«i) 

3 

3 

=  4||PQe||2. 


Taking  square  roots  yields  the  desired  inequality. 

(2)  Putting  5  =  0  in  (22.1 1),  we  see  that  r  = 
zT  x  =  n/i. 


— X1/2Z1/2e.  Therefore, 


2 


(3)  Suppose  that  (x,  z )  E  Whenever  the  norm  of  a  vector  is  smaller  than 

some  number,  the  magnitude  of  each  component  of  the  vector  must  also  be  smaller 
than  this  number.  Hence,  | XjZj  —  /i\  <  /3/x.  It  is  easy  to  see  that  this  inequality  is 
equivalent  to 


(22.12) 


(1  —  /3)fi  <  XjZj  <  (1  +  (3)fi. 


Now  putting  5  =  1  in  (22.11),  we  get 


r 


E 


XjZj 


Therefore,  using  the  lower  bound  given  in  (22.12),  we  get  the  following  upper 
bound: 


r 


< 


1 


(1  -  P)n 


E( 


XjZj 


Finally,  since  (x,z)  G  Af(/3),  we  see  that  the  above  sum  is  bounded  by  ft2 /i2.  This 
gives  the  claimed  inequality.  □ 


Proof  of  Theorem  22.3(2).  Since  6  =  1  in  a  corrector  step,  it  follows  from 
Theorem  22.2(4)  that  XZe  —  fie  =  AX  A  Ze  =  PQe.  Therefore,  parts  (1)  and  (3) 
of  Lemma  22.4  imply  that 


||  XZe  —  lie  ||  =  ||  PQe 


1 

<  —  |r 


2  1 
1 


1/2 


4 


fi. 


(22.13) 


330 


22.  THE  HOMOGENEOUS  SELF-DUAL  METHOD 


We  also  need  to  show  that  (x,z)  >  0.  For  0  <  t  <  1,  let 

x(t)  =  x  +  tAx,  z(t)  =  z  +  tAz,  and  q{t)  =  q(x(t) ,  z(t)) . 

Then  from  part  (4)  of  Theorem  22.2,  we  have 

X(t)Z(t)e  —  q(t)e  =  (1  —  t)(XZe  —  qe)  +  t2AXAZe. 

The  right-hand  side  is  the  sum  of  two  vectors.  Since  the  length  of  the  sum  of  two 
vectors  is  less  than  the  sum  of  the  lengths  (i.e.,  by  the  triangle  inequality ),  it  follows 
that 


(22.14) 


| X(t)Z(t)e  —  q(t)e ||  <  (1  —  t)\\XZe  —  fie ||  +  t2\\AX AZe 


(note  that  we’ve  pulled  the  scalars  out  of  the  norms).  Now,  since  (x,  z)  G  J\f(  1/2), 
we  have  \\XZe—qe\\  <  q/ 2.  Furthermore,  from  (22.13)  we  have  that  ||AXAZe||  = 
||PQe||  <  /i/4.  Replacing  the  norms  in  (22.14)  with  these  upper  bounds,  we  get 
the  following  bound: 

(22.15)  || X(t)Z(t)e  -  n(t)e ||  <  (1  -  +  t2^  <  ^ 

(the  second  inequality  follows  from  the  obvious  facts  that  t 2  <  t  and  /i/4  <  M/2). 
Now,  consider  a  specific  component  j.  It  follows  from  (22.15)  that 

Xj(t)zj(t)  -  H{t)  > 

Since  5  =  1,  part  (3)  of  Theorem  22.2  tells  us  that  q(t)  =  q  for  all  t.  Therefore  the 
previous  inequality  can  be  written  as 

(22.16)  Xj(t)zj(t)  >  —  >  0. 

2 

This  inequality  then  implies  that  Xj(t )  >  0  and  Zj(t)  >  0  for  all  0  <  t  <  1 
(since  they  could  only  become  negative  by  passing  through  0,  which  is  ruled  out  by 

(22.16) ).  Putting  t  =  1,  we  get  that  x3  >  0  and  z3  >  0.  Since  the  component  j  was 

arbitrary,  it  follows  that  (x,  z)  >  0.  Therefore  (x,  z)  G  A/”(  1/4) .  □ 


2.3.  Convergence  Analysis.  The  previous  theorem  showed  that  the  predictor- 
corrector  algorithm  is  well  defined.  The  next  theorem  gives  us  a  lower  bound  on  the 
progress  made  by  each  predictor  step. 


Theorem  22.5.  In  each  predictor  step,  0  > 


Proof.  Using  the  same  notation  as  in  the  proof  of  Theorem  22.3,  we  have  the 
inequality: 


(22.17)  || X(t)Z(t)e  —  q(t)e ||  <  (1  —  t)\\XZe  —  qe 

+  t2\\AXAZe\\. 


This  time,  however,  (x,  z)  G  A/"(l/4)  and  5  =  0.  Hence, 


|  XZe  —  qe 


< 


q 

4 
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and,  from  parts  (1)  and  (2)  of  Lemma  22.4, 


AXAZe 


\pQ4  <  \ 


Using  these  two  bounds  in  (22.17),  we  get  the  following  bound: 

II  X(t)Z(t)e  -  fi(t)e  ||  <(l-t)^+t2y- 

Now,  fix  a  t  <  (2 A/n)_1.  For  such  a  t,  we  have  t2n/ 2  <1/8.  Therefore,  using  the 
fact  that  £  <  1/2  for  n  >  2,  we  get 

II X(t)Z(t)e  -  ||  <  (1  -  tf  +  | 

=  mW 
2 

Hence,  as  in  the  previous  theorem,  (x(t),  z(t ))  G  A/’(l/2).  Since  t  was  an  arbitrary 
number  less  than  (2v/n)~1,  it  follows  that  0  >  (2-^/n)-1.  □ 


Let  (x^k\  z(k>)  denote  the  solution  after  the  fcth  iteration  and  let 

p(/c)  =  p(x^k\zw)  and  /i(/c)  =  p{x^k\zw). 

The  algorithm  starts  with  x^  =  z =  e.  Therefore,  p^  =  1.  Our  aim  is  to 
show  that  p^  and  p ^  tend  to  zero  as  k  tends  to  infinity.  The  previous  theorem 
together  with  Theorem  22.3  implies  that,  after  an  even  number  of  iterations,  say  2 k, 
the  following  inequality  holds: 


p 


(2*0  < 


Also,  since  the  corrector  steps  don’t  change  the  value  of  p,  it  follows  that 

p(2k—l)  _  p(2k) ' 


From  these  two  statements,  we  see  that 

lim  p ^  =  0. 

k — ^oo 


Now,  consider  p^k\  It  follows  from  parts  (2)  and  (3)  of  Theorem  22.2  that  the 
reduction  in  infeasibility  tracks  the  reduction  in  noncomplementarity.  Hence, 

p(*)=p(*)p(°>- 

Therefore,  the  fact  that  p ^  tends  to  zero  implies  the  same  for  p(k\ 

In  fact,  more  can  be  said: 


Theorem  22.6.  The  limits  x *  =  lim^oo  x^  and  z *  =  lim^oo  z ^  exist 
and  (x*,z*)  is  optimal.  Furthermore,  the  vectors  x*  and  z*  are  strictly  comple¬ 
mentary  to  each  other.  That  is,  for  each  j,  x*z*  =0  but  either  x *  >  0  or  z*  >  0. 


332 


22.  THE  HOMOGENEOUS  SELF-DUAL  METHOD 


The  proof  is  fairly  technical,  and  so  instead  of  proving  it,  we  prove  the  following 
theorem,  which  captures  the  main  idea. 

Theorem  22.7.  There  exist  positive  constants  ci,  C2, . . . ,  cn  such  that  (x,  z)  E 
A f(/3)  implies  that  x3  +  z3  >  Cj  >  0  for  each  j  =  1,  2, . . . ,  n. 

Proof.  Put  p  =  p(x,z)  and  p  =  p(x,z)  =  pp(°\  Let  (x*,z*)  be  a  strictly 
complementary  feasible  solution  (the  existence  of  which  is  guaranteed  by  Theorem 
10.6).  We  begin  by  studying  the  expression  zTx *  +  xT z* .  Since  Ax *  A  z*  =  0,  we 
have  that 


T  *  \  T  *  T  *  T  a  * 

ry*  _ I _  ry*  -  y  ry*  _  ry*  /I  ry* 

Ay  *Ay  |  *Ay  aj  Aj  *Ay  *Ay  X  -A.  *Ay 

—  (—Atx  +  z)T x* . 


By  the  skew-symmetry  of  A,  we  see  that  —ATx  -j-  z  =  Ax  +  z  =  p.  And,  since 
p  =  pp(°\  we  get 

(22.18)  zT x*  +  xT z*  =  pp^  x*. 

The  factor  p^  x*  is  a  constant  (i.e.,  it  does  not  depend  on  x  or  z).  Let  us  denote  it 
by  M.  Since  all  the  terms  in  the  two  products  on  the  left  in  (22.18)  are  nonnegative, 
it  follows  that  each  one  is  bounded  by  the  right-hand  side.  So  if  we  focus  on  a 
particular  index  j,  we  get  the  following  bounds: 


(22.19)  ZjXj  <  pM  and  Xj z-  <  pM. 

Now,  we  use  the  assumption  that  (x,  z)  E  N{J$)  to  see  that 

XjZj  >  (1  —  /3)p. 

In  other  words,  p  <  x3z3  /(I  —  /?),  and  so  the  inequalities  in  (22.19)  become 


M 


M 


ZjX*  <  y—jZjXj  and  Xjz*  <  l_g 


xjZj. 


Since  Xj  and  z3  are  strictly  positive,  we  can  divide  by  them  (and  the  constants)  to 
get 


1-/5  *  /  ^  1-/5  *  / 

■ x „•  <  Xj  and  — — —  zA  <  2, 


M  ^  3 


M  i  ~  J 


Putting 


1  -8 

i  /  *  t  *  \ 

Co  —  — rr —  \Xj  -f-  Zj  ) , 


M 

we  get  the  desired  lower  bound  on  Xj  -j-  z3 . 


□ 


2.4.  Complexity  of  the  Predictor-Corrector  Algorithm.  Of  course,  in  prac¬ 
tice  we  don’t  run  an  infinite  number  of  iterations.  Instead,  we  set  a  priori  a  threshold 
and  stop  when  p^  falls  below  it.  The  threshold  is  usually  denoted  by  2_L  where 
L  is  some  number.  Typically,  we  want  the  threshold  to  be  about  10-8,  which  corre¬ 
sponds  to  L  r^y  26. 
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As  we  saw  before,  after  an  even  number  of  iterations,  say  2 k,  the  fi- value  is 
bounded  by  the  following  inequality: 


(2*0  <  (  l 


1  \ 


k 


2^)  ' 


Hence,  it  suffices  to  pick  a  k  big  enough  to  have 


k 


1 - 7=)  <  2_L 

2^/n  ) 


Taking  logarithms  of  both  sides  and  solving  for  k ,  we  see  that  any 


k  > 


L 


-togt1-  vh) 


will  do.  Since  —  log(l  —  x)  >  x,  we  get 


2L\fn  > 


L 


-1°g(1-  vh) 


Therefore,  any  k  >  2L^/n  will  do.  In  particular,  k  =  2 L^/n  rounded  up  to  the 
nearest  integer  will  suffice.  Since  k  represents  half  the  number  of  iterations,  it  fol¬ 
lows  that  it  will  take  at  most  4 Ly/n  iterations  for  the  (i- value  to  fall  below  the 
threshold  of  2_L.  This  bound  implies  that  the  method  is  a  polynomial  algorithm , 
since  it  says  that  any  desired  precision  can  be  obtained  in  a  number  of  iterations  that 
is  bounded  above  by  a  polynomial  in  n  (here,  4 Ly/n  is  not  itself  a  polynomial  but 
is  bounded  above  by  say  a  linear  function  in  n  forn  >  2). 


2.5.  The  KKT  System.  We  end  this  section  on  homogeneous  self-dual  prob¬ 
lems  by  briefly  discussing  the  KKT  system  (22.5)-(22.6).  Solving  this  system  of 
equations  is  the  most  time  consuming  step  within  each  iteration  of  the  predictor- 
corrector  algorithm.  There  are  several  ways  in  which  one  can  organize  the  compu¬ 
tation.  The  approach  that  most  parallels  what  we  have  done  before  is  first  to  solve 
(22.6)  for  Az, 

(22.20)  Az  =  X~1(—ZAx  +  5/ae  —  XZe) 

=  -X~xZAx  +  5iaX~1e  -  z, 

and  then  to  eliminate  it  from  (22.5)  to  get  the  following  reduced  KKT  system : 

(A  -  X~1Z)Ax  =  -(1  -5)p  +  z-  SpX~1e. 

In  the  next  section,  we  apply  the  algorithm  developed  in  this  section  to  the  homoge¬ 
neous  self-dual  problem  given  by  (22.3). 
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3.  Back  to  Standard  Form 

We  return  now  to  the  setup  in  Section  1.  Let  z,  w,  and  ^  denote  the  slack 
variables  for  the  constraints  in  problem  (22.3): 

maximize  0 

subject  to  —  ATy  +  c<p  +  z  =  0, 

(22.21)  Ax  —  bcj)  +  w  =  0, 

—cTx  +  bT  y  +  ^  =  0, 

x ,  y,  </>,  z,  if,  if)  >  0. 

We  say  that  a  feasible  solution  (x,  y,  0,  z,  u),  t/i)  is  strictly  complementary  if  x^  + 
Zj  >  0  for  all  jf,  yi  +  wi  >  0  for  all  i,  and  f  +  ijj  >  0.  Theorem  10.6  ensures  the 
existence  of  such  a  solution  (why?). 

The  following  theorem  summarizes  and  extends  the  motivating  discussion  given 
in  Section  1 . 

Theorem  22.8 .Suppose  that  (x,y,  </>,  z,^,^)  is  a  strictly  complementary 
feasible  (hence,  optimal )  solution  to  (22.21). 

(1)  If<t>  0,  then  x  —  x /(f)  is  optimal  for  the  primal  problem  (22.1)  and 
y  *  =  y/4>  is  optimal  for  its  dual  (22.2). 

(2)  If  cj)  =  0,  then  either  cTx  >  0  or  bT y  <  0. 

(a)  If  cTx  >  0,  then  the  dual  problem  is  infeasible. 

(b)  IfbTy  0,  then  the  primal  problem  is  infeasible. 

PROOF.  Part  (1)  was  proved  in  Section  1.  For  part  (2),  suppose  that  f  =  0.  By 
strict  complementarity,  >  0.  Hence,  x  and  y  satisfy 

ATy  >  0, 

(22.22)  Ax  <  0, 

rr 1  rT1 

b  y  <  c  x. 

From  the  last  inequality,  we  see  that  it  is  impossible  to  have  bT y  >  0  and  cTx  <  0. 
That  is,  either  cTx  >  0  or  bT y  <  0  (or  both).  Suppose,  without  loss  of  generality, 
that  cTx  >  0.  We  will  prove  by  contradiction  that  the  dual  problem  is  infeasible. 
To  this  end,  suppose  that  there  exists  a  vector  y°  >  0  such  that 

(22.23)  ATy°  >  c. 

Since  x  >  0,  we  can  multiply  by  it  without  changing  the  direction  of  an  inequality. 
So  multiplying  (22.23)  on  the  left  by  xT,  we  get 

xT  ATy°  >  xTc. 

Now,  the  right-hand  side  is  strictly  positive.  But  inequality  (22.22)  together  with  the 
nonnegativity  of  y°  implies  that  the  left-hand  side  is  nonpositive: 

xT ATy°  =  ( Ax)Ty°  <  0. 


This  is  a  contradiction  and  therefore  the  dual  must  be  infeasible. 


□ 
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3.1.  The  Reduced  KKT  System.  The  right-hand  side  in  the  reduced  KKT 
system  involves  the  vector  of  infeasibilities.  We  partition  this  vector  into  three  parts 
as  follows: 


(7 

— AT  c 

X 

z 

—ATy  +  cp  +  z 

P 

— 

A  -b 

y 

+ 

w 

— 

Ax  —  bp  +  w 

_  7  _ 

_  -cT  bT 

.  0  _ 

.  ^  _ 

—cTx  +  bT  y  +  p 

The  reduced  KKT  system  for  (22.3)  is  given  by 


-AT 

c 

Ax 

/\ 

C 7 

(22.24) 

A 

T 

—c 

-Y^W 

bT 

1 

^  1 

a  -©> 

<1  <1 

— 

1 

where 


/N 

C 7 

—  (1  —  S)a  +  z  —  SpX~1e 

P 

— 

—  (1  —  S)p  +  w  —  8pY~1e 

7 

— (1  -  5)7  +  p  -  8p/p 

This  system  is  not  symmetric.  One  could  use  a  general  purpose  equation  solver 
to  solve  it,  but  its  special  structure  would  be  mostly  ignored  by  such  a  solver.  To 
exploit  the  structure,  we  solve  this  system  in  two  stages.  We  start  by  using  the  first 
two  equations  to  solve  simultaneously  for  Ax  and  Ay  in  terms  of  Ap: 


Ax 

.  A  y  _ 

-x-1^ 

A 


-AT 

-Y^W 


Introducing  abbreviating  notations,  we  can  write 
(22.25) 

where  the  vectors 


Ax 

fx 

9x 

.  Ay . 

_  fy  . 

9y 

A  <f>, 


f 


fx 

fy 


and  g 


9x 

9y 


are  found  by  solving  the  following  two  systems  of  equations: 


and 


"  -X~xZ 

-AT 

fx 

/N 

<7 

A 

1 

i—1 

3 

_  fy  _ 

P 

1 

1 

N 

4 

9x 

c 

T— 1 

1 

_ 1 

9y 

-b 

Then  we  use  (22.25)  to  eliminate  Ax  and  Ay  from  the  last  equation  in  (22.24): 


P 


A  p  =  7. 


CT  fx  -  tfjfy  +  7 

cT9x  -  bTgy  -ip/f 


We  then  solve  for  Ap: 
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Given  A </>,  (22.25)  determines  Ax  and  Ay.  Once  these  vectors  are  known,  (22.20) 
is  used  to  compute  the  step  directions  for  the  slack  variables: 

Az  =  —  X~xZAx  +  (5/iA_1e  —  z 

Aw  =  -Y^WAy  +  S/xY-'e  -  w 
ib 

Aip  =  — -7  A  cp  +  Sg/cp  —  ip. 

V 

We  now  see  that  the  reduced  KKT  system  can  be  solved  by  solving  two  sys¬ 
tems  of  equations  for  /  and  g.  These  two  systems  both  involve  the  same  matrix. 
Furthermore,  these  systems  can  be  formulated  as  quasidefmite  systems  by  negating 
the  first  equation  and  then  reordering  the  equations  appropriately.  For  example,  the 
quasidefmite  system  for  g  is 


1 

1 

h-1 

3 

A 

9y 

"  -b  " 

AT 

X~xZ 

9x 

— c 

Therefore,  the  techniques  developed  in  Chapter  20  can  be  used  to  solve  these  sys¬ 
tems  of  equations.  In  particular,  to  solve  the  two  systems,  the  quasidefmite  matrix 
only  needs  to  be  factored  once.  Then  the  two  systems  can  be  solved  by  doing  two 
forward  and  two  backward  substitutions.  Since  factorization  requires  more  com¬ 
putation  than  forward  and  backward  substitutions,  one  would  expect  to  be  able  to 
solve  these  two  systems  in  much  less  time  than  if  they  were  each  being  solved  from 
scratch.  In  fact,  it  is  often  the  case  that  one  can  solve  two  systems  involving  the 
same  quasidefmite  matrix  in  little  more  time  than  is  required  to  solve  just  one  such 
system. 

The  full  homogeneous  self-dual  method  is  summarized  in  Figure  22.1. 

4.  Simplex  Method  vs.  Interior-Point  Methods 

Finally,  we  compare  the  performance  of  interior-point  methods  with  the  sim¬ 
plex  method.  For  this  comparison,  we  have  chosen  the  homogeneous  self-dual 
method  described  in  this  chapter  and  the  self-dual  simplex  method  (see  Figure  7.1). 
In  the  interest  of  efficiency  certain  liberties  have  been  taken  with  the  implementa¬ 
tions.  For  example,  in  the  homogeneous  self-dual  method,  (18.6)  is  used  to  com¬ 
pute  “long”  step  lengths  instead  of  the  more  conservative  “short”  step  lengths  in 
(22.10).  The  code  fragments  implementing  each  of  these  two  algorithms  are  shown 
in  Appendix  A. 

A  standard  collection  of  test  problems,  the  so-called  NETLIB  suite,  were  used 
in  the  comparison.  Problems  in  this  collection  are  formulated  with  bounds  and 
ranges: 

minimize  cTx 

subject  to  b  <  Ax  <  b  +  r 
l  <  x  <  u. 

However,  to  keep  the  algorithms  as  simple  as  possible,  they  were  implemented  only 
for  problems  in  our  standard  inequality  form.  Therefore,  the  problems  from  the 
NETLIB  suite  were  converted  to  standard  form  as  follows: 
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initialize 

(x,y,4>,z,w,ip)  =  (e,e,  l,e,e,  1) 

while  (not  optimal)  { 

p  —  ( zT  x  +  wTy  +  00) /(n  +  m  +  1) 

0,  on  odd  iterations 

1 ,  on  even  iterations 

p  —  —(1  —  5){Ax  —  bcj)  +  w)  +  w  —  SpY~1e 

a  =  —(1  —  S)(—ATy  +  c0  +  z)  +  2  —  SpX~1e 
7  —  —(1  —  S)(bTy  —  cT x  +  0)  +  0  —  <fyz/0 
solve  the  two  (n  +  m)  x  (n  +  m)  quasidefinite  systems: 

"  -y_1vu  a 
at  x_1z 

and 


- 1 

1 

h- 1 

3 

A 

9y 

'  -b 

at 

X_1Z 

.  9X  . 

—c 

fy 

1 

Jx  _ 

q> 

1 _ 

A0  =  °T ~  hT  fy  +  ^ 
-  bT gy  -  0/0 


Ax 

fx 

Qx 

Ay 

.  fv 

9y 

Az  =  —  X  1ZAx-\-6/jlX  xe  —  z 
Aw  =  -Y^WAy  +  5pY~1e  -  w 
A 0  —  ~^A0  +  <fyz/0  -  0 

f  max{£  :  (x(t), .  . . ,  0(t))  G  A0(1  /2) }, 

"-t  1, 

x  G-  x  +  6 Ax,  z  i —  z  T  0Az 

y  <—  y  +  #Ay,  w  4 —  ip  H-  (9  A  ip 

0^—0  +  0A0,  0^—0  +  0A0 


on  odd  iterations 
on  even  iterations 


Figure  22.1.  The  homogeneous  self-dual  method. 


—  maximize  —cTx  —  cTl 

subject  to  —Ax  <  —b  -f  Al 
Ax  <  b  +  r  —  A/ 

X  <  u  —  l 

x  >  0. 

Of  course,  this  transformation  is  invalid  when  any  of  the  lower  bounds  are  infinite. 
Therefore,  such  problems  have  been  dropped  in  our  experiment.  Also,  this  trans¬ 
formation  introduces  significant  computational  inefficiency  but,  since  it  was  applied 
equally  to  the  problems  presented  to  both  methods,  the  comparison  remains  valid. 
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The  results  of  our  experiment  are  shown  in  Table  22.1.  The  most  obvious 
observation  is  that  the  simplex  method  is  generally  faster  and  that,  for  many  prob¬ 
lems,  the  slower  method  is  not  more  than  3  or  4  times  slower.  For  problems  in  this 
suite,  these  results  are  consistent  with  results  reported  in  the  literature.  However,  it 
must  be  noted  that  the  problems  in  this  suite  range  only  from  small  to  medium  in 
size.  The  largest  problem,  fit2p,  has  about  3,000  constraints  and  about  14,000  vari¬ 
ables.  By  today’s  standards,  this  problem  is  considered  of  medium  size.  For  larger 
problems,  reports  in  the  literature  indicate  that  interior  point  methods  tend  to  be 
superior  although  the  results  are  very  much  dependent  on  the  specific  class  of  prob¬ 
lems.  In  the  remaining  chapters  of  this  book  we  shall  consider  various  extensions 
of  the  linear  programming  model.  We  shall  see  that  the  simplex  method  is  partic¬ 
ularly  well  suited  for  solving  integer  programming  problems  studied  in  Chapter  23 
whereas  interior  point  methods  are  more  appropriate  for  extensions  into  the  qua¬ 
dratic  and  convex  programming  problems  studied  in  Chapters  24  and  25.  These 
considerations  are  often  more  important  than  speed.  There  are,  of  course,  excep¬ 
tions.  For  example,  the  interior-point  method  is  about  900  times  faster  than  the 
simplex  method  on  problem  fit2p.  Such  a  difference  cannot  be  ignored. 

When  comparing  algorithms  it  is  always  tempting  to  look  for  ways  to  improve 
the  slower  method.  There  are  obvious  enhancements  to  the  interior-point  method 
used  in  this  implementation.  For  example,  one  could  use  the  same  LDLT  factoriza¬ 
tion  to  compute  both  the  predictor  and  the  corrector  directions.  When  implemented 
properly,  this  enhancement  alone  can  almost  halve  the  times  for  this  method. 

Of  course,  the  winning  algorithm  can  also  be  improved  (but,  significant  overall 
improvements  such  as  the  one  just  mentioned  for  the  interior-point  method  are  not  at 
all  obvious).  Looking  at  the  table,  we  note  that  the  interior-point  method  solved  both 
fit2p  and  fit2d  in  roughly  the  same  amount  of  time.  These  two  problems  are  duals  of 
each  other  and  hence  any  algorithm  that  treats  the  primal  and  the  dual  symmetrically 
should  take  about  the  same  time  to  solve  them.  Now,  look  at  the  simplex  method’s 
performance  on  these  two  problems.  There  is  a  factor  of  36  difference  between 
them.  The  reason  is  that,  even  though  we  have  religiously  adhered  to  primal-dual 
symmetry  in  our  development  of  the  simplex  method,  an  asymmetry  did  creep  in.  To 
see  it,  note  that  the  basic  matrix  is  always  a  square  submatrix  of  [A  I  .  That  is, 
it  is  an  m  x  m  matrix.  If  we  apply  the  algorithm  to  the  dual  problem,  then  the  basis 
matrix  is  n  x  n.  Hence,  even  though  the  sequence  of  iterates  generated  should  be 
identical  with  the  two  problems,  the  computations  involved  in  each  iteration  can  be 
very  different  if  m  and  n  are  not  about  the  same.  This  is  the  case  for  the  fit2p/fit2d 
pair.  Of  course,  one  can  easily  think  up  schemes  to  overcome  this  difficulty.  But 
even  if  the  performance  of  the  simplex  method  on  fit2p  can  be  brought  in  line  with 
its  performance  on  fit2d,  it  will  still  be  about  25  times  slower  than  the  interior-point 
on  this  problem — a  difference  that  remains  significant. 
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Name 

Time 

Name 

Time 

Simplex 

method 

Interior 

point 

Simplex 

method 

Interior 

point 

25fv47 

2  min  55.70  s 

3  min  14.82  s 

maros 

1  min  0.87  s 

3  min  19.43  s 

80bau3b 

7  min  59.57  s 

2  min  34.84  s 

nesm 

1  min  40.78  s 

6  min  21.28  s 

adlittle 

Omin  0.26  s 

Omin  0.47  s 

pilot87 

* 

* 

afiro 

Omin  0.03  s 

Omin  0.11  s 

pilotnov 

* 

4min  15.31  s 

agg 

0  min  1 .09  s 

Omin  4.59s 

pilots 

* 

32  min  48.15  s 

agg2 

Omin  1.64  s 

Omin  21.42 s 

recipe 

Omin  0.21  s 

0  min  1 .04  s 

agg3 

Omin  1.72  s 

Omin  26.52 s 

sc  105 

Omin  0.28  s 

Omin  0.37  s 

bandm 

Omin  15.87  s 

Omin  9.01  s 

sc205 

Omin  1.30  s 

Omin  0.84  s 

beaconfd 

Omin  0.67  s 

Omin  6.42 s 

sc50a 

Omin  0.09  s 

Omin  0.17  s 

blend 

Omin  0.40  s 

Omin  0.56 s 

sc50b 

Omin  0.12s 

Omin  0.15  s 

bnll 

Omin  38.38  s 

Omin  46.09  s 

scagr25 

Omin  12.93  s 

Omin  4.44  s 

bnl2 

3  min  54.52  s 

10  min  19.04  s 

scagr7 

Omin  1.16  s 

0  min  1 .05  s 

boeingl 

Omin  5.56 s 

Omin  9.14s 

scfxml 

Omin  4.44s 

Omin  7.80  s 

boeing2 

Omin  0.80s 

Omin  1.72  s 

scfxm2 

Omin  14.33  s 

Omin  18.84s 

bore3d 

Omin  1.17  s 

Omin  3.97  s 

scfxm3 

Omin  28.92s 

Omin  28.92  s 

brandy 

Omin  5.33  s 

Omin  8.44s 

scorpion 

Omin  3.38 s 

Omin  2.64 s 

czprob 

Omin  50.14s 

Omin  41.77  s 

scrs8 

Omin  7.15  s 

Omin  9.53  s 

d2q06c 

* 

1  h  11  min  1.93  s 

scsdl 

Omin  0.86s 

Omin  3.88  s 

d6cube 

2  min  46.71  s 

13  min  44.52  s 

scsd6 

Omin  2.89s 

Omin  9.31  s 

degen2 

Omin  17.28  s 

Omin  17.02  s 

scsd8 

Omin  28.87  s 

Omin  16.82s 

degen3 

5  min  55.52  s 

3  min  36.73  s 

sctapl 

Omin  2.98 s 

Omin  3.08  s 

dflOOl 

8  h  55  min  33.05  s 

** 

sctap2 

Omin  7.41  s 

Omin  12.03  s 

e226 

Omin  4.76  s 

Omin  6.65  s 

sctap3 

Omin  11.70  s 

Omin  17.18  s 

etamacro 

Omin  17.94  s 

Omin  43.40s 

seba 

Omin  27.25  s 

Omin  11.90  s 

fffff800 

Omin  10.07  s 

1  min  9.15  s 

share lb 

Omin  2.07 s 

Omin  10.90  s 

finnis 

Omin  4.76  s 

Omin  6.17  s 

share2b 

Omin  0.47  s 

Omin  0.71  s 

fitld 

Omin  18.15  s 

Omin  11.63  s 

shell 

Omin  16.12s 

Omin  29.45  s 

fitlp 

7  min  10.86  s 

Omin  16.47  s 

ship041 

Omin  3.82s 

Omin  13.60s 

fit2d 

1  h  3  min  14.37  s 

4  min  27.66  s 

ship04s 

Omin  3.48 s 

Omin  10.81  s 

fit2p 

36  h  31  min  31.80  s 

2  min  35.67  s 

ship081 

Omin  17.83  s 

Omin  39.06  s 

forplan 

Omin  3.99s 

* 

ship08s 

Omin  8.85  s 

Omin  19.64  s 

ganges 

Omin  44.27  s 

Omin  34.89s 

ship  121 

Omin  26.55  s 

1  min  8.62  s 

gfrdpnc 

Omin  1 1.51  s 

Omin  8.46s 

ship  12s 

Omin  16.75  s 

Omin  30.33  s 

greenbea 

22  min  45.49  s 

43  min  4.32  s 

sierra 

Omin  10.88  s 

Omin  42.89  s 

grow  15 

Omin  8.55  s 

Omin  58.26 s 

standata 

Omin  0.57  s 

Omin  6.60 s 

grow22 

Omin  11.79  s 

2  min  0.53  s 

standmps 

Omin  2.41  s 

Omin  13.44  s 

grow7 

Omin  3.61  s 

Omin  13.57  s 

stocforl 

Omin  0.22s 

Omin  0.92  s 

israel 

Omin  1.83  s 

Omin  2.66 s 

stocfor2 

Omin  45.15  s 

0  min  40.43  s 

kb2 

Omin  0.15  s 

Omin  0.34s 

woodlp 

Omin  14.15  s 

7  min  18.47  s 

lotfi 

Omin  0.81  s 

Omin  3.36 s 

woodw 

1  min  48.14  s 

8  min  53.92  s 

maros-r7 

* 

1  h  31ml2.06  s 

(*)  Denotes  numerical  difficulties 
(**)  Denotes  insufficient  memory 


Table  22.1.  Comparison  between  the  self-dual  simplex  method 
and  the  homogeneous  self-dual  interior-point  method. 


Exercises 


22.1 

22.2 


When  n  =  1,  the  set  A f(/3)  is  a  subset  of  M2.  Graph  it. 


Suppose  there  is  an  algorithm  for  which  one  can  prove  that 


W  <  1 


a 


k 


f(n) 


for  every  k  >  1,  where  f(n)  denotes  a  specific  function  of  n,  such  as 
f(n)  =  n2,  and  a  is  a  constant.  In  terms  of  a  and  /  and  the  “precision” 
L,  give  a  (tight)  upper  bound  on  the  number  of  iterations  that  would  be 
sufficient  to  guarantee  that 


^  <  2~L . 
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22.3  In  Section  3  of  Chapter  20,  we  extended  the  primal-dual  path-following 
method  to  treat  problems  in  general  form.  Extend  the  homogeneous  self¬ 
dual  method  in  the  same  way. 


22.4  Long-step  variant.  Let 

Xi{/3)  =  {(x,  z)  :  min  XZe  >  (1  —  fi)p{x1  z)}. 

(The  notation  min  XZe  denotes  the  scalar  that  is  the  minimum  of  all  the 
components  of  the  vector  XZe.  Throughout  this  problem,  given  any  vec¬ 
tor  v,  the  notation  min  v  (max  v)  will  denote  the  minimum  (maximum)  of 
the  components  of  v.)  Fix  \  <  (3  <  1  (say  /?  =  0.95).  A  long-step  variant 
of  the  homogeneous  self-dual  method  starts  with  an  initial  (x,  z)  G  M(/3) 
and  in  every  iteration  uses 

6  =  2(1-0) 


and 

6  =  max{£  :  {pc  +  f  Ax,  z  +  tAz)  G  M.{(3)}. 

The  goal  of  this  exercise  is  to  analyze  the  complexity  of  this  algorithm  by 
completing  the  following  steps. 

(a)  Show  that  M{fi)  C  M{f3)  C  M{  1)  =  {(x,  z)  >  0}. 

(b)  Show  that  max(— PQe)  <  ||r||2/4.  Hint:  Start  by  writing 

Pj  Qj  >  Z  Piqi 

i:piqi<0 


and  then  use  the  facts  that  pTq  =  0 ,  p%  +  q%  =  ri,  and  that  for  any 
two  real  numbers  a  and  b,  {a  +  6) 2  >  4 ab  ( prove  this). 

(c)  Show  that  if  (x,  z)  G  A4{/3),  then  ||r||2  <  rip.  Hint:  Use  (22.11)  to 
write  || r || 2  =  ^2j{%jZj  —  Sp)2/ xjz.  j.  Expand  the  numerator,  use  the 
definitions  of  p  and  S  to  simplify,  and  then  use  the  assumption  that 
(x,  z)  G  M(j8)  to  take  care  of  the  remaining  denominator. 

(d)  Show  that  if  (x,  z)  G  M(/3),  then 


6  >  min{l, 


fiSp 

min  PQe 


Af35 

n 


Hint:  Using  the  same  notation  as  in  the  proof  of  Theorem  22.3,  fix 
t  <  min{l,  — /35 p/  min  PQe},  write 

Xj(t)zj(t)  —  p{t)  =  (1  —  t){xjZj  —  p)  +  t2  AxjAzj, 


and  then  replace  the  right-hand  side  by  a  lower  bound  that  is 
independent  of  j.  From  there,  follow  your  nose  until  you  get  the 
first  inequality.  The  second  inequality  follows  from  parts  (b)  and  (c). 
As  usual  letting  p ^  denote  the  p  value  associated  with  the  solution 
on  the  kth  iteration  of  the  algorithm,  show  that 


p 


W  <  1 


4  06 


k 


n 


{1-5) 


NOTES 


341 


(f)  Give  an  upper  bound  on  the  number  of  iterations  required  to  get 

^(fc)  <  2~l. 

(g)  Show  that  6  can  be  computed  by  solving  n  (univariate)  quadratic 
equations. 

(h)  A  robust  implementation  of  a  quadratic  equation  solver  uses  the 
formula 

{—b—  \Zb2-Aac  h  > 

2  a  ’  —  u’ 

- — -  h  ^  Q 

-bWb2-*aC’  ’ 

for  one  of  the  two  roots  to  ax2  +  bx  +  c  =  0  (a  similar  formula  is 
used  for  the  other  one).  Show  that  the  two  expressions  on  the  right 
are  mathematically  equal  and  suggest  a  reason  to  prefer  one  over  the 
other  in  the  particular  cases  indicated. 

Notes 

The  first  study  of  homogeneous  self-dual  problems  appeared  in  Tucker  (1956). 
This  chapter  is  based  on  the  papers  Mizuno  et  al.  (1993),  Ye  et  al.  (1994),  and 
Xu  et  al.  (1993).  The  step  length  formula  (22.10)  forces  the  algorithm  studied  in 
this  chapter  to  take  much  shorter  steps  than  those  in  Chapter  18.  In  general,  algo¬ 
rithms  that  are  based  on  steps  that  confine  the  iterates  to  A f(/3)  are  called  short-step 
methods.  A  long- step  variant  of  the  algorithm  can  be  obtained  by  enlarging  the  set 
Af(/3).  Such  a  variant  is  the  subject  of  Exercise  22.4.  For  this  method,  a  worst 
case  analysis  shows  that  it  takes  on  the  order  of  n  steps  to  achieve  a  given  level  of 
precision.  Xu  et  al.  (1993)  describes  an  efficient  implementation  of  the  long-step 
variant. 

The  predictor-corrector  method  is  a  standard  technique  used  in  the  numeri¬ 
cal  solution  of  ordinary  differential  equations.  Mehrotra  (1992)  (see  also  Mehrotra 
1989)  was  the  first  to  apply  this  technique  in  the  context  of  interior-point  methods, 
although  the  related  notion  of  forming  power  series  approximations  was  suggested 
earlier  by  N.K.  Karmarkar  and  is  described  in  Adler  et  al.  (1989). 
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Extensions 


It’s  hard.  But  it’s  harder  to  ignore  it. 


C.  Stevens 


CHAPTER  23 


Integer  Programming 


Many  real-world  problems  could  be  modeled  as  linear  programs  except  that 
some  or  all  of  the  variables  are  constrained  to  be  integers.  Such  problems  are  called 
integer  programming  problems.  One  might  think  that  these  problems  wouldn’t  be 
much  harder  than  linear  programming  problems.  For  example,  we  saw  in  Chapter  14 
that  for  network  flow  problems  with  integer  data,  the  simplex  method  automatically 
produces  integer  solutions.  But  that  was  just  luck.  In  general,  one  can’t  expect  to 
get  integer  solutions;  in  fact,  as  we  shall  see  in  this  chapter,  integer  programming 
problems  turn  out  to  be  generally  much  harder  to  crack  than  linear  ones. 

There  are  many  important  real-world  problems  that  can  be  formulated  as  integer 
programming  problems.  The  subject  is  so  important  that  several  monographs  are 
devoted  entirely  to  it.  In  this  chapter,  we  shall  just  present  a  few  favorite  applications 
that  can  be  modeled  as  integer  programming  problems  and  then  we  will  discuss  one 
technique  for  solving  problems  in  this  class,  called  branch- and-bound. 

1.  Scheduling  Problems 

There  are  many  problems  that  can  be  classified  as  scheduling  problems.  We 
shall  consider  just  two  related  problems  of  this  type:  the  equipment  scheduling  and 
crew  scheduling  problems  faced  by  large  airlines.  Airlines  determine  how  to  route 
their  planes  as  follows.  First,  a  number  of  specific  flight  legs  are  defined  based  on 
market  demand.  A  leg  is  by  definition  one  flight  taking  off  from  somewhere  at  some 
time  and  landing  somewhere  else  (hopefully).  For  example,  a  leg  could  be  a  flight 
from  New  York  directly  to  Chicago  departing  at  7:30  A.M.  Another  might  be  a  flight 
from  Chicago  to  San  Francisco  departing  at  1:00  P.M.  The  important  point  is  that 
these  legs  are  defined  by  market  demand,  and  it  is  therefore  not  clear  a  priori  how 
to  put  these  legs  together  in  such  a  way  that  the  available  aircraft  can  cover  all  of 
them.  That  is,  for  each  airplane,  the  airline  must  put  together  a  route  that  it  will  fly. 
A  route,  by  definition,  consists  of  a  sequence  of  flight  legs  for  which  the  destination 
of  one  leg  is  the  origin  of  the  next  (and,  of  course,  the  final  destination  must  be  the 
origin  of  the  first  leg,  forming  a  closed  loop). 

The  airline  scheduling  problems  are  generally  tackled  in  two  stages.  First,  rea¬ 
sonable  routes  are  identified  that  meet  various  regulatory  and  temporal  constraints 
(you  can’t  leave  somewhere  before  you’ve  arrived  there — time  also  must  be  reserved 
for  dropping  off  and  taking  on  passengers).  This  route-identification  problem  is  by 
no  means  trivial,  but  it  isn’t  our  main  interest  here,  so  we  shall  simply  assume  that 
a  collection  of  reasonable  routes  has  already  been  identified.  Given  the  potential 
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routes,  the  second  stage  is  to  select  a  subset  of  them  with  the  property  that  each  leg 
is  covered  by  exactly  one  route.  If  the  collection  of  potential  routes  is  sufficiently 
rich,  we  would  expect  there  to  be  several  feasible  solutions.  Therefore,  as  always, 
our  goal  is  to  pick  an  optimal  one,  which  in  this  case  we  define  as  one  that  minimizes 
the  total  cost.  To  formulate  this  problem  as  an  integer  program,  let 

_  J  1  if  route  j  is  selected, 

(  0  otherwise, 

_  J  1  if  leg  i  is  part  of  route  j , 

a,jJ  \  0  otherwise, 


and 


Cj  =  cost  of  using  route  j. 

With  these  notations,  the  equipment  scheduling  problem  is  to 


minimize 

n 

T— 1 

II  ^ 

subject  to 

aijxj — i 

i  =  1,  2, . . . ,  m 

3  = 1 

xj  e  {0, 1} 

j  =  1,  2, . . . , n. 

This  model  is  often  called  a  set-partitioning  problem ,  since  the  set  of  legs  gets 
divided,  or  partitioned,  among  the  various  routes. 

The  flight  crews  do  not  necessarily  follow  the  same  aircraft  around  a  route.  The 
main  reason  is  that  the  constraints  that  apply  to  flight  crews  differ  from  those  for  the 
aircraft  (for  example,  flight  crews  need  to  sleep  occasionally).  Hence,  the  problem 
has  a  different  set  of  potential  routes.  Also,  it  is  sometimes  reasonable  to  allow 
crews  to  ride  as  passengers  on  some  legs  with  the  aim  of  getting  in  position  for  a 
subsequent  flight.  With  these  changes,  the  crew  scheduling  problem  is 


minimize 

n 

3  = 1 
n 

subject  to 

^  aijXj  >  1 

i  =  1,  2, . . . ,  m, 

3  = 1 

Xj  e  {0, 1} 

j  =  1,2, . . .  ,n. 

This  model  is  often  referred  to  as  a  set-covering  problem ,  since  the  crews  are  as¬ 
signed  so  as  to  cover  each  leg. 


2.  The  Traveling  Salesman  Problem 

Consider  a  salesman  who  needs  to  visit  each  of  n  cities,  which  we  shall  enu¬ 
merate  as  0,1  ,...,n  —  1.  His  goal  is  to  start  from  his  home  city,  0,  and  make  a 
tour  visiting  each  of  the  remaining  cities  once  and  only  once  and  then  returning  to 
his  home.  We  assume  that  the  “distance”  between  each  pair  of  cities,  Qj,  is  known 
(distance  does  not  necessarily  have  to  be  distance — it  could  be  travel  time  or,  even 
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Figure  23.1.  A  feasible  tour  in  a  seven-city  traveling  salesman  problem. 

better,  the  cost  of  travel)  and  that  the  salesman  wants  to  make  the  tour  that  min¬ 
imizes  the  total  distance.  This  problem  is  called  the  traveling  salesman  problem. 
Figure  23.1  shows  an  example  with  seven  cities.  Clearly,  a  tour  is  determined  by 
listing  the  cities  in  the  order  in  which  they  will  be  visited.  If  we  let  denote  the  7th 
city  visited,  then  the  tour  can  be  described  simply  as 

SO  0?  S]_,  £>2,  •  •  •  5  Sn  —  1* 

The  total  number  of  possible  tours  is  equal  to  the  number  of  ways  one  can  permute 
the  n  —  1  cities,  i.e.,  (n  —  1)!.  Factorials  are  huge  even  for  small  n  (for  example, 
50!  =  3.041  x  1064).  Hence,  enumeration  is  out  of  the  question.  Our  aim  is  to 
formulate  this  problem  as  an  integer  program  that  can  be  solved  more  quickly  than 
by  using  enumeration. 

It  seems  reasonable  to  introduce  for  each  (i,j)  a  decision  variable  Xij  that  will 
be  equal  to  one  if  the  tour  visits  city  j  immediately  after  visiting  city  i\  otherwise, 
it  will  be  equal  to  zero.  In  terms  of  these  variables,  the  objective  function  is  easy  to 
write: 

minimize  CijXij . 

i  3 

The  tricky  part  is  to  formulate  constraints  to  guarantee  that  the  set  of  nonzero  x^ s 
corresponds  exactly  to  a  bonafide  tour.  Some  of  the  constraints  are  fairly  obvious. 
For  example,  after  the  salesman  visits  city  i,  he  must  go  to  one  and  only  one  city 
next.  We  can  write  these  constraints  as 

(23.1)  xij  =  1,  i  =  0, 1, . . . ,  n  —  1 

j 

(we  call  them  the  go -to  constraints).  Similarly,  when  the  salesman  visits  a  city,  he 
must  have  come  from  one  and  only  one  prior  city.  That  is, 

(23.2)  yUy  =  1,  j  =  0, 1, . . .  ,71  -  1 
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Figure  23.2.  Two  disjoint  subtours  in  a  seven-city  traveling 
salesman  problem. 


(by  analogy  we  call  these  the  come -from  constraints).  If  the  go-to  and  the  come- 
from  constraints  are  sufficient  to  ensure  that  the  decision  variables  represent  a  tour, 
the  traveling  salesman  problem  would  be  quite  easy  to  solve  because  it  would  just 
be  an  assignment  problem,  which  can  be  solved  efficiently  by  the  simplex  method. 
But  unfortunately,  these  constraints  are  not  sufficient,  since  they  do  not  rule  out  the 
possibility  of  forming  disjoint  subtours.  An  example  is  shown  in  Figure  23.2. 

We  need  to  introduce  more  constraints  to  guarantee  connectivity  of  the  graph 
that  the  tour  represents.  To  see  how  to  do  this,  consider  a  specific  tour 

SO  0?  <§]_,  S2i  •  •  •  1  Sn—  1* 


Let  ti  for  i  =  0, 1, . . . ,  n  be  defined  as  the  number  of  the  stop  along  the  tour  at  which 
city  i  is  visited;  i.e.,  “when”  city  i  is  visited  along  the  tour.  From  this  definition,  we 
see  that  t0  =0,  tSl  =1,  tS2  =  2,  etc.  In  general, 

tSi  L  i  0, 1, . . . ,  n  1, 


so  that  we  can  think  of  the  tj  ’s  as  being  the  inverse  of  the  sf  s.  For  a  bonafide  tour, 


tj  —  ti  T- 1, 


if  Xij  =  1. 


Also,  each  ti  is  an  integer  between  0  and  n  —  1,  inclusive.  Hence,  tj  satisfies  the 
following  constraints: 


(  ti  +  1  -  n 
\  ti  + 1 


if  =  0, 

if  =  1 . 


(Note  that  by  subtracting  n  in  the  x^  =  0  case,  we  have  effectively  made  the 
condition  always  hold.)  These  constraints  can  be  written  succinctly  as 


(23.3)  tj  >  U  +  1  -  n(  1  -  x^),  i  >  0,  j  >  1,  i  ^  j. 

Now,  these  constraints  were  derived  based  on  conditions  that  a  bonafide  tour  satis¬ 
fies.  It  turns  out  that  they  also  force  a  solution  to  be  a  bonafide  tour.  That  is,  they 
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rule  out  subtours.  To  see  this,  suppose  to  the  contrary  that  there  exists  a  solution 
to  (23.1),  (23.2),  and  (23.3)  that  consists  of  at  least  two  subtours.  Consider  a  sub¬ 
tour  that  does  not  include  city  0.  Let  r  denote  the  number  of  legs  on  this  subtour. 
Clearly,  r  >  2.  Now,  sum  (23.3)  over  all  arcs  on  this  subtour.  On  the  left,  we  get  the 
sum  of  the  tj’s  over  each  city  visited  by  the  subtour.  On  the  right,  we  get  the  same 
sum  plus  r.  Cancelling  the  sums  from  the  two  sides,  we  get  that 


0  >  r, 

which  is  a  contradiction.  Hence,  the  traveling  salesman  problem  can  be  formulated 
as  the  following  integer  programming  problem: 

minimize  E  G-ij  %ij 

hJ 

n 

subject  to  Xij  =  1,  i  =  0, 1, . . . ,  n  —  1, 

3  = 1 
n 


i— 1 

tj  >  U  +  1  -  n(  1  - 


j  =  0,  l,...,n-  1, 
^  —  1 U 


f0  =  0, 
c  {0, 1}, 

L  C  {0, 1,2,.. .}. 

Note  that,  for  the  n-city  problem,  there  are  n2  +  n  variables  in  this  formulation. 


3.  Fixed  Costs 

The  terms  in  an  objective  function  often  represent  costs  associated  with  engag¬ 
ing  in  an  activity.  Until  now,  we’ve  always  assumed  that  each  of  these  terms  is  a 
linear  function  such  as  cx.  However,  it  is  sometimes  more  realistic  to  assume  that 
there  is  a  fixed  cost  for  engaging  in  the  activity  plus  a  linear  variable  cost.  That  is, 
one  such  term  might  have  the  form 

,  ,  f  0  if  x  =  0 

[  K  +  cx  if  x  >  0. 

If  we  assume  that  there  is  an  upper  bound  on  the  size  of  x,  then  it  turns  out  that  such 
a  function  can  be  equivalently  modeled  using  strictly  linear  functions  at  the  expense 
of  introducing  one  integer- valued  variable.  Indeed,  suppose  that  u  is  an  upper  bound 
on  the  x  variable.  Let  y  denote  a  {0, 1} -valued  variable  that  is  one  when  and  only 
when  x  >  0.  Then 

c(x)  =  Ky  +  cx. 

Also,  the  condition  that  y  is  one  exactly  when  x  >  0  can  be  guaranteed  by  introduc¬ 
ing  the  following  constraints: 

x  <  uy 
x  >  0 

V  C  {0, 1}. 
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Of  course,  if  the  objective  function  has  several  terms  with  associated  fixed  costs, 
then  this  trick  must  be  used  on  each  of  these  terms. 

4.  Nonlinear  Objective  Functions 

Sometimes  the  terms  in  the  objective  function  are  not  linear  at  all.  For  example, 
one  such  term  could  look  like  the  function  shown  in  Figure  23.3.  In  Chapter  25,  we 
will  discuss  efficient  algorithms  that  can  be  used  in  the  presence  of  nonlinear  objec¬ 
tive  functions — at  least  when  they  have  appropriate  convexity/concavity  properties. 
In  this  section,  we  will  show  how  to  formulate  an  integer  programming  approxi¬ 
mation  to  a  general  nonlinear  term  in  the  objective  function.  The  first  step  is  to 
approximate  the  nonlinear  function  by  a  continuous  piecewise  linear  function. 

The  second  step  is  to  introduce  integer  variables  that  allow  us  to  represent  the 
piecewise  linear  function  using  linear  relations.  To  see  how  to  do  this,  first  we 
decompose  the  variable  x  into  a  sum, 

X  =  x\  +  x2  H - \-Xk, 

where  Xi  denotes  how  much  of  the  interval  [0,  x\  is  contained  in  the  fih  linear  seg¬ 
ment  of  the  piecewise  linear  function  (see  Figure  23.4).  Of  course,  some  of  the 
initial  segments  will  lie  entirely  within  the  interval  [0,  x],  one  segment  will  lie  par¬ 
tially  in  and  partially  out,  and  then  the  subsequent  segments  will  lie  entirely  outside 
of  the  interval.  Hence,  we  need  to  introduce  constraints  to  guarantee  that  the  initial 
Xi’s  are  equal  to  the  length  of  their  respective  segments  and  that  after  the  straddling 
segment  the  subsequent  xi  s  are  all  zero.  A  little  thought  reveals  that  the  following 
constraints  do  the  trick: 

LjWj  <  Xj  <  LjWj- 1  j  =  1,  2, . . . ,  k 
wo  =  1 

Wj  G  {0, 1}  j  =  1,  2, . . . ,  k 

Xj  >  0  j  =  1,  2, . . . ,  k. 


Figure  23.3.  A  nonlinear  function  and  a  piecewise  linear  ap¬ 
proximation  to  it. 
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Figure  23.4.  A  piecewise  linear  function. 


Indeed,  it  follows  from  these  constraints  that  Wj  <  Wj-i  for  j  =  1,  2, . . . ,  k.  This 
inequality  implies  that  once  one  of  the  wf* s  is  zero,  then  all  the  subsequent  ones 
must  be  zero.  If  Wj  =  Wj-i  =  1,  the  two-sided  inequality  on  Xj  reduces  to  Lj  < 
Xj  <  Lj.  That  is,  x3  =  Lj.  Similarly,  if  w3  =  Wj-i  =  0,  then  the  two-sided 
inequality  reduces  to  Xj  =0.  The  only  other  case  is  when  Wj  =  0  but  Wj- 1  =  1.  In 
this  case,  the  two-sided  inequality  becomes  0  <  x3  <  L3 .  Therefore,  in  all  cases, 
we  get  what  we  want.  Now  with  this  decomposition  we  can  write  the  piecewise 
linear  function  as 

K  -j-  cixi  +  c2x2  H - 1-  ckxk. 


5.  Branch-and-Bound 

In  the  previous  sections,  we  presented  a  variety  of  problems  that  can  be  formu¬ 
lated  as  integer  programming  problems.  As  it  happens,  all  of  them  had  the  property 
that  the  integer  variables  took  just  one  of  two  values,  namely,  zero  or  one.  How¬ 
ever,  there  are  other  integer  programming  problems  in  which  the  integer  variables 
can  be  any  nonnegative  integer.  Hence,  we  define  the  standard  integer  programming 
problem  as  follows: 

maximize  cTx 
subject  to  Ax  <  b 

x  >  0 

x  has  integer  components. 

In  this  section,  we  shall  present  an  algorithm  for  solving  these  problems.  The 
algorithm  is  called  branch- and-bound.  It  involves  solving  a  (potentially)  large  num¬ 
ber  of  (related)  linear  programming  problems  in  its  search  for  an  optimal  integer 
solution.  The  algorithm  starts  out  with  the  following  wishful  approach:  first  ignore 
the  constraint  that  the  components  of  x  be  integers,  solve  the  resulting  linear  pro¬ 
gramming  problem,  and  hope  that  the  solution  vector  has  all  integer  components. 
Of  course,  hopes  are  almost  always  unfulfilled,  and  so  a  backup  strategy  is  needed. 
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Figure  23.5.  An  integer  programming  problem.  The  dots  repre¬ 
sent  the  feasible  integer  points,  and  the  shaded  region  shows  the 
feasible  region  for  the  LP-relaxation. 


The  simplest  strategy  would  be  to  round  each  resulting  solution  value  to  its  nearest 
integer  value.  Unfortunately,  this  naive  strategy  can  be  quite  bad.  In  fact,  the  inte¬ 
ger  solution  so  obtained  might  not  even  be  feasible,  which  shouldn’t  be  surprising, 
since  we  know  that  the  solution  to  a  linear  programming  problem  is  at  a  vertex  of 
the  feasible  set  and  so  it  is  quite  expected  that  naive  movement  will  go  outside  of 
the  feasible  set. 

To  be  concrete,  consider  the  following  example: 

maximize  17xi  +  12^2 

subject  to  10xi  +  7x2  <  40 

X\  +  X2  <5 

X1,  X2  >  0 

xi,  X2  integers. 


The  linear  programming  problem  obtained  by  dropping  the  integrality  constraint  is 
called  the  LP-relaxation.  Since  it  has  fewer  constraints,  its  optimal  solution  provides 
an  upper  bound  on  the  the  optimal  solution  to  the  integer  programming  prob¬ 
lem.  Figure  23.5  shows  the  feasible  points  for  the  integer  programming  problem  as 
well  as  the  feasible  polytope  for  its  LP-relaxation.  The  solution  to  the  LP-relaxation 
is  at  {x\ ,  X2)  =  (5/3, 10/3),  and  the  optimal  objective  value  is  205/3  =  68.33. 
Rounding  each  component  of  this  solution  to  the  nearest  integer,  we  get  (2,3), 
which  is  not  even  feasible.  The  feasible  integer  solution  that  is  closest  to  the  LP- 
optimal  solution  is  (1,3),  but  we  can  see  from  Figure  23.5  that  this  solution  is  not 
the  optimal  solution  to  the  integer  programming  problem.  In  fact,  it  is  easy  to  see 
from  the  figure  that  the  optimal  integer  solution  is  either  (1, 4)  or  (4,  0).  To  make  the 
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Figure  23.6.  The  feasible  subregions  formed  by  the  first  branch. 

problem  interesting,  we’ve  chosen  the  objective  function  to  make  the  more  distant 
point  (4,  0)  be  the  optimal  solution. 

Of  course,  we  can  solve  only  very  small  problems  by  the  graphical  method:  to 
solve  larger  problems,  an  algorithm  is  required,  which  we  now  describe.  Consider 
variable  x\  in  the  optimal  solution  to  the  LP-relaxation.  Its  value  is  5/3.  In  the 
optimal  solution  to  the  integer  programming  problem,  it  will  be  an  integer.  Hence, 
it  will  satisfy  either  x\  <  1  or  x\  >  2.  We  consider  these  two  cases  separately. 
Let  Pi  denote  the  linear  programming  problem  obtained  by  adding  the  constraint 
x\  <  1  to  the  LP-relaxation,  and  let  P 2  denote  the  problem  obtained  by  including 
the  other  possibility,  x\  >  2.  The  feasible  regions  for  Pi  and  P2  are  shown  in 
Figure  23.6.  Let  us  study  Pi  first.  It  is  clear  from  Figure  23.6  that  the  optimal 
solution  is  at  (xi,x2)  =  (1,4)  with  an  objective  value  of  65.  Our  algorithm  has 
found  its  first  feasible  solution  to  the  integer  programming  problem.  We  record  this 
solution  as  the  best-so-far.  Of  course,  better  ones  may  (in  this  case,  will)  come 
along  later. 

Now  let’s  consider  P2.  Looking  at  Figure  23.6  and  doing  a  small  amount  of 
calculation,  we  see  that  the  optimal  solution  is  at  (aq,  x2)  =  (2,  20/7).  In  this  case, 
the  objective  function  value  is  478/7  =  68.29.  Now  if  this  value  had  turned  out  to 
be  less  than  the  best-so-far  value,  then  we’d  be  done,  since  any  integer  solution  that 
lies  within  the  feasible  region  for  P2  would  have  a  smaller  value  yet.  But  this  is  not 
the  case,  and  so  we  must  continue  our  systematic  search.  Since  x2  =  20/7  =  2.86, 
we  divide  P2  into  two  subproblems,  one  in  which  the  constraint  x2  <  2  is  added 
and  one  with  x2  >  3  added. 

Before  considering  these  two  new  cases,  note  that  we  are  starting  to  develop  a 
tree  of  linear  programming  subproblems.  This  tree  is  called  the  enumeration  tree . 
The  tree  as  far  as  we  have  investigated  is  shown  in  Figure  23.7.  The  double  box 
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Figure  23.7.  The  beginnings  of  the  enumeration  tree. 


*2 


Figure  23.8.  The  refinement  of  P2  to  P3. 


around  Pi  indicates  that  that  part  of  the  tree  is  done:  i.e.,  there  are  no  branches 
emanating  from  Pi — it  is  a  leaf  node.  The  two  empty  boxes  below  P2  indicate  two 
subproblems  that  have  yet  to  be  studied.  Let’s  proceed  by  looking  at  the  left  branch, 
which  corresponds  to  adding  the  constraint  x2  <  2  to  what  we  had  before.  We  de¬ 
note  this  subproblem  by  P3.  Its  feasible  region  is  shown  in  Figure  23.8,  from  which 
we  see  that  the  optimal  solution  is  at  (2.6,  2).  The  associated  optimal  objective  value 
is  68.2.  Again,  the  solution  is  fractional.  Hence,  the  process  of  subdividing  must 
continue.  This  time  we  subdivide  based  on  the  values  of  x\.  Indeed,  we  consider 
two  cases:  either  x\  <  2  or  x\  >  3. 
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Figure  23.9.  The  enumeration  tree  after  solving  P3. 


Figure  23.9  shows  the  enumeration  tree  as  it  now  stands.  At  this  juncture,  there 
are  three  directions  in  which  we  could  proceed.  We  could  either  study  the  other 
branch  under  P2  or  work  on  one  of  the  two  branches  sitting  under  P3 .  If  we  were 
to  systematically  solve  all  the  problems  on  a  given  level  of  the  tree  before  going 
deeper,  we  would  be  performing  what  is  referred  to  as  a  breadth-first  search.  On 
the  other  hand,  going  deep  before  going  wide  is  called  a  depth- first  search.  For 
reasons  that  we  shall  explain  later,  it  turns  out  to  be  better  to  do  a  depth-first  search. 
And,  to  be  specific,  let  us  always  choose  the  left  branch  before  the  right  branch  (in 
practice,  there  are  much  better  rules  that  one  can  employ  here).  So  our  next  linear 
programming  problem  is  the  one  that  we  get  by  adding  the  constraint  that  x\  <  2  to 
the  constraints  that  defined  P3.  Let  us  call  this  new  problem  P4.  Its  feasible  region 
is  shown  in  Figure  23.10.  It  is  easy  to  see  that  the  optimal  solution  to  this  problem 
is  (2,2),  with  an  objective  value  of  58.  This  solution  is  an  integer  solution,  so  it  is 
feasible  for  the  integer  programming  problem.  But  it  is  not  better  than  our  best-so- 
far.  Nonetheless,  we  do  not  need  to  consider  any  further  subproblems  below  this 
one  in  the  enumeration  tree. 

Since  problem  P4  is  a  leaf  in  the  enumeration  tree,  we  need  to  work  back  up 
the  tree  looking  for  the  first  node  that  has  an  unsolved  problem  sitting  under  it.  For 
the  case  at  hand,  the  unsolved  problem  is  on  the  right  branch  underneath  P3.  Let 
us  call  this  problem  P5.  It  too  is  depicted  in  Figure  23.10.  The  optimal  solution 
is  (3, 1.43),  with  an  optimal  objective  function  value  of  68.14.  Since  this  objective 
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Figure  23.10.  The  refinement  of  P3  to  P4. 


function  value  is  larger  than  the  value  of  the  best-so-far  integer  solution,  we  must 
further  investigate  by  dividing  into  two  possibilities,  either  £2  <  1  or  >  2.  At 
this  point,  the  enumeration  tree  looks  like  that  shown  in  Figure  23.11. 

Let  Pq  denote  the  linear  programming  problem  that  we  get  on  the  left  branch 
under  P5.  Its  feasible  region  is  shown  in  Figure  23.12.  The  optimal  solution  is 
(3.3, 1),  and  the  associated  objective  value  is  68.1.  Again,  the  solution  is  fractional 
and  has  a  higher  objective  value  than  the  best-so-far  integer  solution.  Hence,  it  must 
be  subdivided  based  on  x\  <  3  as  opposed  to  x\  >  4.  Denoting  these  two  problems 
by  P7  and  their  feasible  regions  are  as  depicted  in  Figure  23.13.  The  solution  to 
Pj  is  (3, 1),  and  the  objective  value  is  63.  This  is  an  integer  solution,  but  it  is  not 
better  than  the  best-so-far.  Nonetheless,  the  node  becomes  a  leaf,  since  the  solution 
is  integral.  Hence,  we  move  on  to  Pg.  The  solution  to  this  problem  is  also  integral, 
(4, 0).  Also,  the  objective  value  associated  with  this  solution  is  68,  which  is  a  new 
record  for  feasible  integer  solutions.  Hence,  this  solution  becomes  our  best-so-far. 
The  enumeration  tree  at  this  point  is  shown  in  Figure  23.14. 

Now  we  need  to  go  back  and  solve  the  problems  under  P5  and  P2  (and  any 
subproblems  thereof).  It  turns  out  that  both  these  subproblems  are  infeasible,  and  so 
no  more  subdivisions  are  needed.  The  enumeration  tree  is  now  completely  fathomed 
and  is  shown  in  Figure  23.15.  We  can  now  assert  that  the  optimal  solution  to  the 
original  integer  programming  problem  was  found  in  problem  Pg.  The  solution  is 
(#i,  X2)  =  (4, 0),  and  the  associated  objective  function  value  is  68. 

There  are  three  reasons  why  depth-first  search  is  generally  the  preferred  order 
in  which  to  fathom  the  enumeration  tree.  The  first  is  based  on  the  observation  that 
most  integer  solutions  lie  deep  in  the  tree.  There  are  two  advantages  to  finding 
integer  feasible  solutions  early.  The  first  is  simply  the  fact  that  it  is  better  to  have 
a  feasible  solution  than  nothing  in  case  one  wishes  to  abort  the  solution  process 
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Figure  23.1 1.  The  enumeration  tree  after  solving  P5.  The  dou¬ 
ble  box  around  P4  indicates  that  it  is  a  leaf  in  the  tree. 


early.  But  more  importantly,  identifying  an  feasible  integer  solution  can  result  in 
subsequent  nodes  of  the  enumeration  tree  being  made  into  leaves  simply  because 
the  optimal  objective  function  associated  with  that  node  is  lower  than  the  best-so- 
far  integer  solution.  Making  such  nodes  into  leaves  is  called  pruning  the  tree  and 
can  account  for  tremendous  gains  in  efficiency. 

A  second  reason  to  favor  depth-first  search  is  the  simple  fact  that  it  is  very  easy 
to  code  the  algorithm  as  a  recursively  defined  function.  This  may  seem  trite,  but  one 
shouldn’t  underestimate  the  value  of  code  simplicity  when  implementing  algorithms 
that  are  otherwise  quite  sophisticated,  such  as  the  one  we  are  currently  describing. 

The  third  reason  to  favor  depth-first  search  is  perhaps  the  most  important.  It 
is  based  on  the  observation  that  as  one  moves  deeper  in  the  enumeration  tree,  each 
subsequent  linear  programming  problem  is  obtained  from  the  preceding  one  by  sim¬ 
ply  adding  (or  refining)  an  upper/lower  bound  on  one  specific  variable.  To  see  why 
this  is  an  advantage,  consider  for  example  problem  P2,  which  is  a  refinement  of  Pq. 
The  optimal  dictionary  for  problem  P0  is  recorded  as 
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x2 


Figure  23.12.  The  refinement  of  P5  to  Pq. 


x2 


Figure  23.13.  The  refinement  of  Pq  to  Pj  and  P$. 

/■  _  205  5„  ,  1„ 

(,  —  —  —  3W1  —  3^2 

X\  =  |  -  +  |l02 

x2  =  ^  +  \wi  -  ^fw2. 

Problem  P2  is  obtained  from  P0  by  adding  the  constraint  that  x\  >  2.  Introducing  a 
variable,  gi,  to  stand  for  the  difference  between  x\  and  this  lower  bound  and  using 
the  dictionary  above  to  write  x\  in  terms  of  the  nonbasic  variables,  we  get 
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Figure  23.14.  The  enumeration  tree  after  solving  Pq ,  P7,  and  Pg. 


1  1  7 

9l  =  X!  -  2  =  --  -  -Wi  +  -w2- 

Therefore,  we  can  use  the  following  dictionary  as  a  starting  point  for  the  solution  of 

P2 : 

/■ _  205  5  „  1 .  . 

s  —  -3 —  3^1  ~  3^2 

Xi=  |  -  \wi  + 

x2  =  ^  -  ^^2 

+  \w2- 


9 1  = 
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Figure  23.15.  The  complete  enumeration  tree. 


This  dictionary  is  dual  feasible  but  primal  infeasible.  Therefore,  the  dual  simplex 
method  is  likely  to  find  a  new  optimal  solution  in  very  few  iterations.  According 
to  the  dual  simplex  method,  variable  g\  is  the  leaving  variable  and  w2  is  the  corre¬ 
sponding  entering  variable.  Making  the  pivot,  we  get  the  following  dictionary: 


c= 

478 

7 

\gi 

X\  = 

2 

+ 

9i 

x2  = 

20 

7 

\wi  - 

10  ~ 

7  9 1 

W2  = 

\  + 

\wi  + 

3 

7^1  * 
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This  dictionary  is  optimal  for  P2.  In  general,  the  dual  simplex  method  will  take 
more  than  one  iteration  to  reoptimize,  but  nonetheless,  one  does  expect  it  to  get  to  a 
new  optimal  solution  quickly. 

We  end  this  chapter  by  remarking  that  many  real  problems  have  the  property 
that  some  variables  must  be  integers  but  others  can  be  real  valued.  Such  problems 
are  called  mixed  integer  programming  problems.  It  should  be  easy  to  see  how  to 
modify  the  branch-and-bound  technique  to  handle  such  problems  as  well. 


Exercises 

23.1  Knapsack  Problem.  Consider  a  picnicker  who  will  be  carrying  a  knapsack 
that  holds  a  maximum  amount  b  of  “stuff.”  Suppose  that  our  picnicker 
must  decide  what  to  take  and  what  to  leave  behind.  The  jth  thing  that 
might  be  taken  occupies  a3  units  of  space  in  the  knapsack  and  will  bring 
Cj  amount  of  “enjoyment.”  The  knapsack  problem  then  is  to  maximize 
enjoyment  subject  to  the  constraint  that  the  stuff  brought  must  fit  into  the 
knapsack: 

n 

maximize  E  CjXj 

3=1 

n 

subject  to  E  ajxj  <  b 

3  = 1 

Xj  E  {0, 1}  j  =  1,  2, . . . ,  n. 

This  apparently  simple  problem  has  proved  difficult  for  general-purpose 
branch-and-bound  algorithms.  To  see  why,  analyze  the  special  case  in 
which  each  thing  contributes  the  same  amount  of  enjoyment,  i.e.,  Cj  =  c 
for  all  j,  and  takes  up  exactly  two  units  of  space,  i.e.,  a3  =  2  for  all  j. 
Suppose  also  that  the  knapsack  holds  n  units  of  stuff. 

(a)  What  is  the  optimal  solution  when  n  is  even?  when  n  is  odd? 

(b)  How  many  subproblems  must  the  branch-and-bound  algorithm  con¬ 
sider  when  n  is  odd? 

23.2  Vehicle  Routing.  Consider  the  dispatching  of  delivery  vehicles  (for  exam¬ 
ple,  mail  trucks,  fuel-oil  trucks,  newspaper  delivery  trucks,  etc.).  Typi¬ 
cally,  there  is  a  fleet  of  vehicles  that  must  be  routed  to  deliver  goods  from 
a  depot  to  a  given  set  of  n  drop-points.  Given  a  set  of  feasible  deliv¬ 
ery  routes  and  the  cost  associated  with  each  one,  explain  how  to  formu¬ 
late  the  problem  of  minimizing  the  total  delivery  cost  as  a  set-partitioning 
problem. 

23.3  Explain  how  to  modify  the  integer  programming  reformulation  of  contin¬ 
uous  piecewise  linear  functions  so  that  it  covers  piecewise  linear  functions 
having  discontinuities  at  the  junctions  of  the  linear  segments.  Can  fixed 
costs  be  handled  with  this  approach? 
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Notes 

Standard  references  for  integer  programming  include  the  classic  text  by 
Garfinkel  and  Nemhauser  (1972)  and  the  more  recent  text  by  Nemhauser  and  Wolsey 
(1988).  Bill  Cook’s  recent  book  Cook  (2012)  on  the  traveling  salesman  prob¬ 
lem  gives  an  entertaining  historical  and  mathematical  perspective  on  this  particular 
application  of  integer  programming. 


CHAPTER  24 


Quadratic  Programming 


In  Chapter  23,  we  studied  a  generalization  of  the  linear  programming  problem 
in  which  variables  were  constrained  to  take  on  integer  values.  In  this  chapter,  we 
consider  a  generalization  of  a  different  kind.  Namely,  we  shall  study  the  class  of 
problems  that  would  be  linear  programs  except  that  the  objective  function  is  permit¬ 
ted  to  include  terms  involving  products  of  pairs  of  variables.  Such  terms  are  called 
quadratic  terms ,  and  the  problems  we  shall  study  are  called  quadratic  programming 
problems. 

We  have  two  reasons  for  being  interested  in  quadratic  programming  problems. 
First,  on  the  practical  side,  there  are  many  real-world  optimization  problems  that 
fall  into  this  category.  This  is  so  because  most  real-world  applications  have  an  ele¬ 
ment  of  uncertainty  to  them  and  that  uncertainty  is  modeled  by  including  a  sum  of 
squares  deviation,  i.e.  variance,  as  a  measure  of  the  robustness  of  the  solution.  It 
is  often  possible  to  arrange  it  so  that  these  quadratic  robustness  terms  appear  only 
in  the  objective  function.  The  quadratic  version  of  the  portfolio  selection  prob¬ 
lem  studied  in  Chapter  13  is  one  such  example — there  are  many  others.  The  sec¬ 
ond  reason  for  our  interest  in  quadratic  programming  problems  is  that  they  form  a 
bridge  to  the  much  broader  subject  of  convex  programming  that  we  shall  take  up  in 
Chapter  25. 

We  begin  this  chapter  with  a  quadratic  variant  of  the  portfolio  selection  problem. 


1.  The  Markowitz  Model 

Harry  Markowitz  received  the  1990  Nobel  Prize  in  Economics  for  his  port¬ 
folio  optimization  model  in  which  the  tradeoff  between  risk  and  reward  is  explic¬ 
itly  treated.  We  shall  briefly  describe  this  model  in  its  simplest  form.  We  start  by 
reintroducing  the  basic  framework  of  the  problem.  Those  who  have  read  Chap¬ 
ter  13  will  note  that  the  first  few  paragraphs  here  are  a  repeat  of  what  was  written 
there. 

Given  a  collection  of  potential  investments  (indexed,  say,  from  1  to  n),  let 
Rj  denote  the  return  in  the  next  time  period  on  investment  j,  j  =  1, . . . ,  n.  In 
general,  Rj  is  a  random  variable,  although  some  investments  may  be  essentially 
deterministic. 

A  portfolio  is  determined  by  specifying  what  fraction  of  one’s  assets  to  put  into 
each  investment.  That  is,  a  portfolio  is  a  collection  of  nonnegative  numbers  Xj, 
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j  =  1, . . . ,  n,  that  sum  to  one.  The  return  (on  each  dollar)  one  would  obtain  using  a 
given  portfolio  is  given  by 

R  =  x  j  Rj . 

j 

The  reward  associated  with  such  a  portfolio  is  defined  as  the  expected  return: 

E  R  =  XjKRj. 

j 

If  reward  alone  were  the  issue,  the  problem  would  be  trivial:  simply  put  everything 
in  the  investment  with  the  highest  expected  return.  But  unfortunately,  investments 
with  high  reward  typically  also  carry  a  high  level  of  risk.  That  is,  even  though  they 
are  expected  to  do  very  well  in  the  long  run,  they  also  tend  to  be  erratic  in  the  short 
term.  Markowitz  defined  the  risk  associated  with  an  investment  to  be  the  variance 
of  the  return: 


Var(ii)  =  E(ii  —  Ei?)‘ 


2 


where  Rj  =  Rj  —  E Rj .  One  would  like  to  maximize  the  reward  while  at  the  same 
time  not  incur  excessive  risk.  In  the  Markowitz  model,  one  forms  a  linear  combina¬ 
tion  of  the  mean  and  the  variance  (parametrized  here  by  /x)  and  minimizes  that: 


(24.1) 


minimize 
subject  to 


XjKRj  +  /xE  [  Xj  R 


j 


E  xj  =  1 


J 


Xj  >  0 


j  =  1,2,  .  .  .  ,71. 


Here,  as  in  Chapter  13,  /x  is  a  positive  parameter  that  represents  the  importance  of 
risk  relative  to  reward:  high  values  of  /x  tend  to  minimize  risk  at  the  expense  of 
reward,  whereas  low  values  put  more  weight  on  reward. 

Again,  as  in  Chapter  13,  whenever  there  are  individual  investments  that  are 
negatively  correlated,  i.e.  one  is  likely  to  go  up  exactly  on  those  days  where  the 
other  is  likely  to  go  down,  it  is  wise  to  buy  some  of  each.  This  is  called  hedg¬ 
ing.  In  statistics,  the  so-called  covariance  matrix  is  the  key  to  identifying  nega¬ 
tive  correlations.  And,  the  covariance  matrix  is  what  appears  in  the  Markowitz 
model.  To  see  it,  let  us  expand  the  square  in  our  expression  for  the  variance  of  the 
portfolio: 
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=  y^  y^  xjXj’E(RjRj) 
i  j 


—  y  y  xiXjCij , 
i  j 

where 

Qj  =  E  (RiRj) 

is  the  covariance  matrix.  Hence,  problem  (24.1)  can  be  rewritten  as 

minimize  —  y^  Tjxj  +  p  y^  y^  XiXjCij 

j  i  j 

(24.2)  subject  to  y^  x3  =  1 

j 

Xj  >  0  j  =  1,  2, . . . ,  n, 

where  we  have  introduced  r\/  =  Eiij  for  the  mean  return  on  investment  j. 

Solving  problem  (24.2)  requires  an  estimate  of  the  mean  return  for  each  of 
the  investments  as  well  as  an  estimate  of  the  covariance  matrix.  However,  these 
quantities  are  not  known  theoretically  but  instead  must  be  estimated  by  looking  at 
historical  data.  For  example,  Table  24.1  shows  annual  returns  from  1973  to  1994  for 
eight  different  possible  investments:  U.S.  3-Month  T-Bills,  U.S.  Government  Long 
Bonds,  S&P  500,  Wilshire  5000  (a  collection  of  small  company  stocks),  NASDAQ 
Composite,  Lehman  Brothers  Corporate  Bonds  Index,  EAFE  (a  securities  index  for 
Europe,  Asia,  and  the  Far  East),  and  Gold.  Let  Rj  (t)  denote  the  return  on  investment 
j  in  year  1972  + 1.  One  way  to  estimate  the  mean  E Rj  is  simply  to  take  the  average 
of  the  historical  returns: 

T 

rj  =  E Rj  =  —  Y  Rj(t). 

t= i 


There  are  two  drawbacks  to  this  simple  formula.  First,  whatever  happened  in  1973 
certainly  has  less  bearing  on  the  future  than  what  happened  in  1994.  Hence,  giving 
all  the  past  returns  equal  weight  puts  too  much  emphasis  on  the  distant  past  at  the 
expense  of  the  recent  past.  A  better  estimate  is  obtained  by  using  a  discounted  sum: 


T—t 


T-t 


Here,  p  is  a  discount  factor.  Putting  p  =  0.9  gives  a  weighted  average  that  puts  more 
weight  on  the  most  recent  years.  To  see  the  effect  of  discounting  the  past,  consider 
the  Gold  investment.  The  unweighted  average  return  is  1.129,  whereas  the  weighted 
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Year 

US 

3 -Month 
T-Bills 

US 

Gov. 

Long 

Bonds 

S&P 

500 

Wilshire 

5000 

NASDAQ 

Composite 

Lehman 

Bros. 

Corp. 

Bonds 

EAFE 

Gold 

1973 

1.075 

0.942 

0.852 

0.815 

0.698 

1.023 

0.851 

1.677 

1974 

1.084 

1.020 

0.735 

0.716 

0.662 

1.002 

0.768 

1.722 

1975 

1.061 

1.056 

1.371 

1.385 

1.318 

1.123 

1.354 

0.760 

1976 

1.052 

1.175 

1.236 

1.266 

1.280 

1.156 

1.025 

0.960 

1977 

1.055 

1.002 

0.926 

0.974 

1.093 

1.030 

1.181 

1.200 

1978 

1.077 

0.982 

1.064 

1.093 

1.146 

1.012 

1.326 

1.295 

1979 

1.109 

0.978 

1.184 

1.256 

1.307 

1.023 

1.048 

2.212 

1980 

1.127 

0.947 

1.323 

1.337 

1.367 

1.031 

1.226 

1.296 

1981 

1.156 

1.003 

0.949 

0.963 

0.990 

1.073 

0.977 

0.688 

1982 

1.117 

1.465 

1.215 

1.187 

1.213 

1.311 

0.981 

1.084 

1983 

1.092 

0.985 

1.224 

1.235 

1.217 

1.080 

1.237 

0.872 

1984 

1.103 

1.159 

1.061 

1.030 

0.903 

1.150 

1.074 

0.825 

1985 

1.080 

1.366 

1.316 

1.326 

1.333 

1.213 

1.562 

1.006 

1986 

1.063 

1.309 

1.186 

1.161 

1.086 

1.156 

1.694 

1.216 

1987 

1.061 

0.925 

1.052 

1.023 

0.959 

1.023 

1.246 

1.244 

1988 

1.071 

1.086 

1.165 

1.179 

1.165 

1.076 

1.283 

0.861 

1989 

1.087 

1.212 

1.316 

1.292 

1.204 

1.142 

1.105 

0.977 

1990 

1.080 

1.054 

0.968 

0.938 

0.830 

1.083 

0.766 

0.922 

1991 

1.057 

1.193 

1.304 

1.342 

1.594 

1.161 

1.121 

0.958 

1992 

1.036 

1.079 

1.076 

1.090 

1.174 

1.076 

0.878 

0.926 

1993 

1.031 

1.217 

1.100 

1.113 

1.162 

1.110 

1.326 

1.146 

1994 

1.045 

0.889 

1.012 

0.999 

0.968 

0.965 

1.078 

0.990 

Table  24. 1 .  Returns  per  dollar  for  each  of  eight  investments  over 
several  years.  That  is,  $1  invested  in  US  3-Month  T-Bills  on 
January  1,  1973,  was  worth  $1,075  on  December  31,  1973. 


average  is  1.053.  Most  experts  in  1995  felt  that  a  5.3  %  return  represented  a  more 
realistic  expectation  than  a  12.9  %  return.  In  the  results  that  follow,  all  expectations 
are  estimated  by  computing  weighted  averages  using  p  =  0.9. 

The  second  issue  concerns  the  estimation  of  means  (not  variances).  An  invest¬ 
ment  that  returns  1.1  one  year  and  0.9  the  next  has  an  (unweighted)  average  return 
of  1,  that  is,  no  gain  or  loss.  However,  one  dollar  invested  will  actually  be  worth 
(1.1)(0.9)  =  0.99  at  the  end  of  the  second  year.  While  a  1  %  error  is  fairly  small, 
consider  what  happens  if  the  return  is  2.0  one  year  and  then  0.5  the  next.  Clearly, 
the  value  of  one  dollar  at  the  end  of  the  2  years  is  (2.0)  (0.5)  =  1,  but  the  average 
of  the  two  returns  is(2.0  +  0.5)/2  =  1.25.  There  is  a  very  significant  difference 
between  an  investment  that  is  flat  and  one  that  yields  a  25  %  return  in  2  years.  This 
is  obviously  an  effect  for  which  a  correction  is  required.  We  need  to  average  2.0  and 
0.5  in  such  a  way  that  they  cancel  out — and  this  cancellation  must  work  not  only  for 
2.0  and  0.5  but  for  every  positive  number  and  its  reciprocal.  The  trick  is  to  average 
the  logarithm  of  the  returns  (and  then  exponentiate  the  average).  The  logarithm  has 
the  correct  effect  of  cancelling  a  return  r  and  its  reciprocal: 

1 

logr  +  log  -  =  0. 
r 

Hence,  we  estimate  means  from  Table  24.1  using 

(eL 

V  EL  pT-f 


E  Rj  =  exp 
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n 

Gold  US  Lehman  NASDAQ  S&P  EALE 

3-Month  Bros.  Composite  500 

T-Bills  Corp. 

Bonds 

Mean  Std. 

dev. 

0.0 

0.1 

1.0 

2.0 

4.0 

8.0 

1024.0 

1.000 
0.603  0.397 

0.876  0.124 

0.036  0.322  0.549  0.092 

0.487  0.189  0.261  0.062 

0.713  0.123  0.117  0.047 

0.008  0.933  0.022  0.016  0.022 

1.122  0.227 

1.121  0.147 

1.120  0.133 

1.108  0.102 
1.089  0.057 

1.079  0.037 

1.070  0.028 

Table  24.2.  Optimal  portfolios  for  several  choices  of  /i. 


This  estimate  for  Gold  gives  an  estimate  of  its  return  at  2.9  %,  which  is  much  more 
in  line  with  the  beliefs  of  experts  (at  least  in  1995). 

Table  24.2  shows  the  optimal  portfolios  for  several  choices  of  fi.  The  corre¬ 
sponding  optimal  values  for  the  mean  and  standard  deviation  (which  is  defined  as 
the  square  root  of  the  variance)  are  plotted  in  Figure  24.1.  Letting  (i  vary  con¬ 
tinuously  generates  a  curve  of  optimal  solutions.  This  curve  is  called  the  efficient 
frontier.  Any  portfolio  that  produces  a  mean-variance  combination  that  does  not 
lie  on  the  efficient  frontier  can  be  improved  either  by  increasing  its  mean  without 
changing  the  variance  or  by  decreasing  the  variance  without  changing  the  mean. 
Hence,  one  should  only  invest  in  portfolios  that  lie  on  the  efficient  frontier. 

Of  course,  the  optimal  portfolios  shown  in  Table  24.2  were  obtained  by  solv¬ 
ing  (24.1).  The  rest  of  this  chapter  is  devoted  to  describing  an  algorithm  for  solving 
quadratic  programs  such  as  this  one. 

2.  The  Dual 

We  have  seen  that  duality  plays  a  fundamental  role  in  our  understanding  and 
derivation  of  algorithms  for  linear  programming  problems.  The  same  is  true  for 
quadratic  programming.  Hence,  our  first  goal  is  to  figure  out  what  the  dual  of  a 
quadratic  programming  problem  should  be. 

Quadratic  programming  problems  are  usually  formulated  as  minimizations. 
Therefore,  we  shall  consider  problems  given  in  the  following  form: 

minimize  cTx  +  \ xTQx 
(24.3)  subject  to  Ax  >  b 

x  >  0. 

Of  course,  we  may  (and  do)  assume  that  the  matrix  Q  is  symmetric  (see  Exer¬ 
cise  24.2).  Note  that  we  have  also  changed  the  sense  of  the  inequality  constraints 
from  our  usual  less-than  to  greater-than.  This  change  is  not  particularly  important — 
its  only  purpose  is  to  maintain  a  certain  level  of  parallelism  with  past  formulations 
(that  is,  minimizations  have  always  gone  hand-in-hand  with  greater-than  constraints, 
while  maximizations  have  been  associated  with  less-than  constraints). 

In  Chapter  5,  we  derived  the  dual  problem  by  looking  for  tight  bounds  on  the 
optimal  solution  to  the  primal  problem.  This  approach  could  be  followed  here,  but 
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Figure  24.1.  The  efficient  frontier. 

it  seems  less  compelling  in  the  context  of  quadratic  programming.  A  more  direct 
approach  stems  from  the  connection  between  duality  and  the  first-order  optimality 
conditions  for  the  barrier  problem  that  we  examined  in  Chapter  17.  Indeed,  let  us 
start  by  writing  down  the  barrier  problem  associated  with  (24.3).  To  this  end,  we 
introduce  a  nonnegative  vector  w  of  surplus  variables  and  then  subtract  a  barrier 
term  for  each  nonnegative  variable  to  get  the  following  barrier  problem: 

minimize  cTx  A  \ xTQx  —  fi  log  Xj  —  fi  JA  log 
subject  to  Ax  —  w  =  b. 

Next,  we  introduce  the  Lagrangian: 

f(x,  w,  y )  =  cT x  A  -xT Qx  —  1 1  log  Xj  —  y  log  wi 

3  i 

A  yT  {b  —  Ax  A  w). 

The  first-order  optimality  conditions  for  the  barrier  problem  are  obtained  by 
differentiating  the  Lagrangian  with  respect  to  each  of  its  variables  and  setting  these 
derivatives  to  zero.  In  vector  notation,  setting  to  zero  the  derivative  with  respect  to 
the  x  variables  gives 

c  A  Qx  —  yX~1e  —  ATy  =  0. 
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Similarly,  setting  to  zero  the  derivatives  with  respect  to  the  w  and  y  variables  gives 


—fiW  1e  +  y  =  0 
b  —  Ax  +  w  =  0, 

respectively.  As  we  did  in  our  study  of  linear  programming  problems,  we  now 
introduce  a  new  vector  z  given  by 


z  =  yX  1e. 

With  this  definition,  the  first-order  optimality  conditions  can  be  summarized  as 

rT] 

A  y  +  z  —  Qx  =  c 
Ax  —  w  =  b 
XZe  =  ye 
YWe  =  ye. 

From  the  last  two  conditions,  we  see  that  the  dual  problem  involves  an  n- vector 
of  variables  z  that  are  complementary  to  the  primal  variables  x  and  an  m-vector  of 
variables  y  that  are  complementary  to  the  primal  slack  variables  w.  Because  of  these 
complementarity  conditions,  we  expect  that  the  variables  y  and  z  are  constrained  to 
be  nonnegative  in  the  dual  problem.  Also,  to  establish  the  proper  connection  be¬ 
tween  the  first-order  optimality  conditions  and  the  dual  problem,  we  must  recognize 
the  first  condition  as  a  dual  constraint.  Hence,  the  constraints  for  the  dual  prob¬ 
lem  are 


m 

A  y  +  z  —  Qx  =  c 

yx>  o. 

It  is  interesting  to  note  that  the  dual  constraints  involve  an  n- vector  x  that  seems  as 
if  it  should  belong  to  the  primal  problem.  This  may  seem  odd,  but  when  understood 
properly  it  turns  out  to  be  entirely  harmless.  The  correct  interpretation  is  that  the 
variable  x  appearing  in  the  dual  has,  in  principle,  no  connection  to  the  variable  x 
appearing  in  the  primal  (except  that,  as  we  shall  soon  see,  at  optimality  they  will  be 
equal). 

The  barrier  problem  has  helped  us  write  down  the  dual  constraints,  but  it  does 
not  shed  any  light  on  the  dual  objective  function.  To  see  what  the  dual  objective 
function  should  be,  we  look  at  what  it  needs  to  be  for  the  weak  duality  theorem  to 
hold  true.  In  the  weak  duality  theorem,  we  assume  that  we  have  a  primal  feasible 
solution  (x,  w)  and  a  dual  feasible  solution  (x,  y ,  z).  We  then  follow  the  obvious 
chains  of  equalities: 


yT(Ax)  yT  (b  +  w) 


and 


m  rri  rri  rri  rri  rri 

(. A  y)  x  =  (c  —  z  +  Qx)  x  =  c  x  —  z  x  +  x  Qx. 
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Now,  since  yT(Ax )  =  ( ATy)Tx ,  we  see  that 

rr~]  r~r~]  rr~i  rp  rp 

0  <  y  w  +  z  x  =  c  x  +  x  Qx  —  by 

i  i 

=  (c  Qx)  —  (b  y  -  -x  Qx). 

From  this  inequality,  we  see  that  the  dual  objective  function  is 
Hence,  the  dual  problem  can  be  stated  now  as 

maximize  bT  y  —  \xT  Qx 

rp 

subject  to  A  y  +  z  —  Qx  =  c 

y,z>  o. 


^ xTQx . 


For  linear  programming,  the  fundamental  connection  between  the  primal  and  dual 
problems  is  summarized  in  the  Complementary  Slackness  Theorem.  In  the  next 
section,  we  shall  derive  a  version  of  this  theorem  for  quadratic  programming. 


3.  Convexity  and  Complexity 

In  linear  programming,  the  dual  problem  is  important  because  it  provides  a  cer¬ 
tificate  of  optimality  as  manifest  in  the  Complementary  Slackness  Theorem.  Under 
certain  conditions,  the  same  is  true  here.  Let  us  start  by  deriving  the  analogue  of 
the  Complementary  Slackness  Theorem.  The  derivation  begins  with  a  reiteration  of 
the  derivation  of  the  Weak  Duality  Theorem.  Indeed,  let  (x,w)  denote  a  feasible 
solution  to  the  primal  problem  and  let  (x,y,z)  denote  a  feasible  solution  to  the  dual 
problem  (we  have  put  a  bar  on  the  dual  x  to  distinguish  it  from  the  one  appearing  in 
the  primal).  The  chain  of  equalities  that  form  the  backbone  of  the  proof  of  the  Weak 
Duality  Theorem  are,  as  always,  obtained  by  writing  yT Ax  two  ways,  namely, 

yT(Ax)  =  ( ATy)Tx , 

and  then  producing  the  obvious  substitutions 

yT(Ax)  =  yT(b  +  w)  =  bTy  +  yTw 

and 

(. ATy)Tx  =  (c  —  z  +  Qx)T  x  =  cT  x  —  zTx  +  xTQx. 

Comparing  the  ends  of  these  two  chains  and  using  the  fact  that  both  yTw  and  zTx 
are  nonnegative,  we  see  that 

(24.4)  0  <  yTw  +  zT x  =  cTx  +  xT Qx  —  bTy. 

So  far,  so  good. 

Now,  what  about  the  Complementary  Slackness  Theorem?  In  the  present  con¬ 
text,  we  expect  this  theorem  to  say  roughly  the  following:  given  a  solution  (x*,  w*) 
that  is  feasible  for  the  primal  and  a  solution  (x*,y*,  z*)  that  is  feasible  for  the  dual, 
if  these  solutions  make  inequality  (24.4)  into  an  equality,  then  the  primal  solution  is 
optimal  for  the  primal  problem  and  the  dual  solution  is  optimal  for  the  dual  problem. 

Let’s  try  to  prove  this.  Let  (x,  w)  be  an  arbitrary  primal  feasible  solution.  Weak 
duality  applied  to  (x,  w)  on  the  primal  side  and  (x*,y*,  z*)  on  the  dual  side  says 
that 


cTx  +  x*T Qx  —  bT y*  >  0. 
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But  for  the  specific  primal  feasible  solution  (x*,w*),  this  inequality  is  an  equality: 

cTx*  +  x*TQx*  —  bT  y*  =  0. 

Combining  these,  we  get 

cTx*  +  x*tQx*  <  cT x  +  x*T Qx. 

This  is  close  to  what  we  want,  but  not  quite  it.  Recall  that  our  aim  is  to  show  that 
the  primal  objective  function  evaluated  at  x*  is  no  larger  than  its  value  at  x.  That  is, 

T  *  i  *  T  /  \  ^  T  i  T  /  \ 

c  x  +  -x  Qx  <  c  x  +  -x  Qx. 

It  is  easy  to  get  from  the  one  to  the  other.  Starting  from  the  desired  left-hand  side, 
we  compute  as  follows: 

T  *  i  *  T  s  \  *  T  *  i  *  T  /  \  %  1  *  T  /  \  % 

c  x  +  -x  Qx  =  c  x  +  x  Qx  —  -x  Qx 

<  cT x  +  x*T Qx  —  -x*T Qx* 

—  cT  x  +  -xTQx  —  -xTQx  +  x*T  Qx  —  -x*T  Qx* 

—  cTx  +  -xTQx  —  -(x  —  x*)T  Q(x  —  x*). 

The  last  step  in  the  derivation  is  to  drop  the  subtracted  term  on  the  right-hand  side  of 
the  last  expression.  We  can  do  this  if  the  quantity  being  subtracted  is  nonnegative. 
But  is  it?  In  general,  the  answer  is  no.  For  example,  if  Q  were  the  negative  of  the 
identity  matrix,  then  the  expression  (x  —  x*)TQ(x  —  x*)  would  be  negative  rather 
than  nonnegative. 

So  it  is  here  that  we  must  impose  a  restriction  on  the  class  of  quadratic  pro¬ 
gramming  problems  that  we  study.  The  correct  assumption  is  that  Q  is  positive 
semidefinite.  Recall  from  Chapter  19  that  a  matrix  Q  is  positive  semidefinite  if 

for  all  £  E  Mn . 

With  this  assumption,  we  can  finish  the  chain  of  inequalities  and  conclude  that 

T  *  i  ^  *  T  /  \  *  ^  T  i  T  /  "i 

c  x  +  -x  Qx  <  c  x  +  -x  Qx. 

Since  x  was  an  arbitrary  primal  feasible  point,  it  follows  that  x*  (together  with  w*) 
is  optimal  for  the  primal  problem.  A  similar  analysis  shows  that  y*  (together  with 
x*  and  z*)  is  optimal  for  the  dual  problem  (see  Exercise  24.4). 

A  quadratic  programming  problem  of  the  form  (24.3)  in  which  the  matrix  Q 
is  positive  semidefinite  is  called  a  convex  quadratic  programming  problem.  The 
discussion  given  above  can  be  summarized  in  the  following  theorem: 

THEOREM  24.1.  For  convex  quadratic  programming  problems,  given  a  solution 
(x*,  w*)  that  is  feasible  for  the  primal  and  a  solution  (#*,  y* ,  z*)  that  is  feasible  for 
the  dual,  if  these  solutions  make  inequality  (24.4)  into  an  equality,  then  the  primal 
solution  is  optimal  for  the  primal  problem  and  the  dual  solution  is  optimal  for  the 
dual  problem. 
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Figure  24.2.  The  objective  function  for  (24.5)  in  the  case  where 
n  =  2. 


To  see  how  bad  things  are  when  Q  is  not  positive  semidefinite,  consider  the 
following  example: 


(24.5) 


minimize  ^  •  xj( 1  —  xj )  +  cjxj 

subject  to  0  <  Xj  <  1,  j  =  1,  2, . . . ,  n. 


We  assume  that  the  coefficients,  Cj,  j  =  1,  2, . . . ,  n,  are  small.  To  be  precise,  we 
assume  that 


C7 


<1,  3  =  1,2, 


n. 


Let  /(#)  denote  the  value  of  the  objective  function  at  point  x.  Setting  the  gradient 
to  zero, 

V/(x)  =  e  —  2x  +  c  =  0, 


we  see  that  there  is  one  interior  critical  point.  It  is  given  by 


x  =  (e  +  c)/2 


(the  assumption  that  c  is  small  guarantees  that  this  x  lies  in  the  interior  of  the  feasible 
set:  0  <  x  <  1).  However,  this  critical  point  is  a  local  maximum,  since  the  ma¬ 
trix  of  second  derivatives  is  —21.  The  algebraic  details  are  tedious,  but  if  we  look 
at  Figure  24.2,  it  is  easy  to  be  convinced  that  every  vertex  of  the  feasible  set  is  a 
local  minimum.  While  this  particular  problem  is  easy  to  solve  explicitly,  it  does 
indicate  the  essential  difficulty  associated  with  nonconvex  quadratic  programming 
problems — namely,  for  such  problems  one  may  need  to  check  every  vertex  individ¬ 
ually,  and  there  may  be  an  exponential  number  of  such  vertices. 

The  situation  for  convex  quadratic  programming  problems  is  much  better,  since 
they  inherit  most  of  the  properties  that  made  linear  programs  efficiently  solvable. 
Indeed,  in  the  next  section,  we  derive  an  interior-point  method  for  quadratic  pro¬ 
gramming  problems. 
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4.  Solution  via  Interior-Point  Methods 

In  this  section,  we  derive  an  interior-point  method  for  quadratic  programming 
problems.  We  start  from  the  first-order  optimality  conditions,  which  we  saw  in  the 
last  section  are  given  by 

A  y  A  z  —  Qx  =  c 
Ax  —  w  =  b 
XZe  =  ye 
YWe  =  ye. 

Following  the  derivation  given  in  Chapter  18,  we  replace  (x,w,y,z)  with 
(x  +  Ax,w  A  A w,y  A  A y,z  A  A z)  to  get  the  following  nonlinear  system  in 

(Ax,  A w,  Ay ,  A z): 

AT  Ay  A  A z  —  QAx  =  c  —  ATy  —  z  A  Qx  =:  a 
AAx  —  Aw  =  b  —  Ax  Aw  =:  p 

ZAx  A  XAz  A  AX  A  Ze  =  ye  —  XZe 
WAy  A  YAw  A  AFAffe  =  ye  -  YWe. 

Next,  we  drop  the  nonlinear  terms  to  get  the  following  linear  system  for  the  step 
directions  (Ax,  Aw,  Ay ,  A z): 

AT Ay  A  A z  —  QAx  =  a 
AAx  —  Aw  =  p 
ZAx  A  XAz  =  ye  —  XZe 
WAy  A  YAw  —  ye  —  YWe. 

Following  the  reductions  of  Chapter  19,  we  use  the  last  two  equations  to  solve  for 
Az  and  Aw  to  get 

Az  =  X~1(tie  -  XZe  -  ZAx) 

Aw  =  Y~\iie  -  YWe  -  WAy). 

We  then  use  these  expressions  to  eliminate  Az  and  Aw  from  the  remaining  two 
equations  in  the  system.  After  elimination,  we  arrive  at  the  following  reduced  KKT 
system : 

(24.6)  AT  Ay  —  (X-1  Z  +  Q)  Ax  =  cr  —  yX~xe  +  z 

(24.7)  AAx  A  Y~XW Ay  =  p  A  yY~xe  —  w. 

Substituting  in  the  definitions  of  p  and  a  and  writing  the  system  in  matrix 
notation,  we  get 


'-(X-iZ  +  Q) 

AT  " 

Ax 

c  —  ATy  —  yX  xeAQx 

A 

£ 

T— 1 

1 

_A  y_ 

b  —  Ax  A  yY~xe 

A  summary  of  the  algorithm  is  shown  in  Figure  24.3.  It  should  be  clear  that 
the  quadratic  term  in  the  objective  function  plays  a  fairly  small  role.  In  fact,  the 
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initialize  (x,  w,y,z)  >  0 

while  (not  optimal)  { 

p  =  b  —  Ax  +  w 

a  =  c  —  ATy  —  z  +  Qx 

7  =  zT  x  +  yTw 
x  7 

fl  =  o - 

n  +  m 

solve: 


~-(X~xZ  +  Q)  AT 

Ax 

c  —  ATy  —  pX  1e  +  Qx 

A  Y~XW 

Ay 

b  —  Ax  +  pY~le 

Az  = 

■■X^ine- XZe- ZAx) 

Aw  =  Y-1(pe  -  YWe  -  W Ay) 

/  f  Axj  A  Wi  A  yi 
0  =  r  maxy  < - - , - , 

V  l  xj  wi  Vi 

x  x  +  OAx ,  w  <—  w  +  6 Aw 

y  y  +  0A?/,  z  z  +  OAz 

} 


Figure  24.3.  The  path-following  method  for  quadratic  program¬ 
ming  problems. 

convergence  analysis  given  in  Chapter  18  can  be  easily  adapted  to  yield  analogous 
results  for  quadratic  programming  problems  (see  Exercise  24.6). 

5.  Practical  Considerations 

For  practical  implementations  of  interior-point  algorithms,  we  saw  in  Chap¬ 
ter  19  that  the  difficulties  created  by  dense  rows/columns  suggest  that  we  solve  the 
reduced  KKT  system  using  an  equation  solver  that  can  handle  symmetric  indefinite 
systems  (such  as  those  described  in  Chapter  20).  Quadratic  programming  problems 
give  us  even  more  reason  to  prefer  the  reduced  KKT  system.  To  see  why,  let  us 
reduce  the  system  further  to  get  a  feel  for  the  normal  equations  for  quadratic  pro¬ 
gramming. 

If  we  use  (24.6)  to  solve  for  Ax  and  then  eliminate  it  from  (24.7),  we  get 

Ax  =  —  (X~1Z  +  Q)-1  (c  —  ATy  +  Qx  —  pX~1e  —  AT  Ay) 

and  the  associated  system  of  normal  equations  (in  primal  form): 

(A(X~1Z  +  Q)~1AT  +  Y~XW)  Ay  =  b  -  Ax  +  pY~le 

+A(X~1Z  +  Q)-1  (c  -  ATy  A  Qx  -  pX~xe)  . 

As  we  saw  in  Chapter  19,  the  most  significant  disadvantage  of  the  normal  equations 
is  that  they  could  involve  a  dense  matrix  even  when  the  original  constraint  matrix 
is  sparse.  For  quadratic  programming,  this  disadvantage  is  even  more  pronounced. 
Now  the  matrix  of  normal  equations  has  the  nonzero  pattern  of  A(D  +  Q)~1AT, 
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where  D  is  a  diagonal  matrix.  If  Q  is  a  diagonal  matrix,  then  this  matrix  appearing 
between  A  and  AT  is  diagonal,  and  the  system  has  the  same  structure  as  we  saw  for 
linear  programming.  But  if  Q  is  not  a  diagonal  matrix,  then  all  hope  for  any  sparsity 
in  A(D  +  Q)~1AT  is  lost. 

Fortunately,  however,  the  dual  form  of  the  normal  equations  is  likely  to  retain 
some  sparsity.  Indeed,  to  derive  the  dual  form,  we  use  (24.7)  to  solve  for  Ay  and 
then  eliminate  it  from  (24.6).  The  result  is 

Ay  =  YW-1  ( b-Ax  +  yY~1e  -  AAx) 

and 

-  [X~xZ  +  Q  +  AtYW~1A)  A x  =  c-  ATy  +  Qx-  yX~le 

-  AtYW~ 1  (6  -  Ax  +  yY~le) . 

Now  the  matrix  has  a  nonzero  pattern  of  AT  A  +  Q.  This  pattern  is  much  more  likely 
to  be  sparse  than  the  pattern  we  had  above. 

As  mentioned  earlier,  there  is  significantly  less  risk  of  fill-in  if  Q  is  diagonal. 
A  quadratic  programming  problem  for  which  Q  is  diagonal  is  called  a  separable 
quadratic  programming  problem.  It  turns  out  that  every  nonseparable  quadratic  pro¬ 
gramming  problem  can  be  replaced  by  an  equivalent  separable  version,  and  some¬ 
times  this  replacement  results  in  a  problem  that  can  be  solved  dramatically  faster 
than  the  original  nonseparable  problem.  The  trick  reveals  itself  when  we  remind 
ourselves  that  the  problems  we  are  studying  are  convex  quadratic  programs,  and  so 
we  ask  the  question:  how  do  we  know  that  the  matrix  Q  is  positive  semidefinite?  Or, 
more  to  the  point,  how  does  the  creator  of  the  model  know  that  Q  is  positive  semi¬ 
definite?  There  are  many  equivalent  characterizations  of  positive  semidefiniteness, 
but  the  one  that  is  easiest  to  check  is  the  one  that  says  that  Q  is  positive  semidefinite 
if  and  only  if  it  can  be  factored  as  follows: 

Q  =  FtDF. 

Here  F  is  a  k  x  n  matrix  and  D  is  a  k  x  k  diagonal  matrix  having  all  nonnegative 
diagonal  entries.  In  fact,  the  model  creator  often  started  with  F  and  D  and  then 
formed  Q  by  multiplying.  In  these  cases,  the  matrix  F  will  generally  be  less  dense 
than  Q.  And  if  k  is  substantially  less  than  n,  then  the  following  substitution  is  almost 
guaranteed  to  dramatically  improve  the  solution  time.  Introduce  new  variables  y  by 
setting 

y  =  Fx. 

With  this  definition,  the  nonseparable  quadratic  programming  problem  (24.3)  can 
be  replaced  by  the  following  equivalent  separable  one: 

minimize  cTx  +  \yT  Dy 
subject  to  Ax  >  b 

Fx  —  y  =  0 
x  >  0. 
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The  cost  of  separation  is  the  addition  of  k  new  constraints.  As  we  said  before,  if  k 
is  small  and/or  F  is  sparse,  then  we  can  expect  this  formulation  to  be  solved  more 
efficiently. 

To  illustrate  this  trick,  let  us  return  to  the  Markowitz  model.  Recall  that  the 
quadratic  terms  in  this  model  come  from  the  variance  of  the  portfolio’s  return,  which 
is  given  by 


Var(i?)  =  E  C^XjRj) 


T 


=  Epw  E^Aw 


t= 1 


Here, 


3 


T-t 


for  t  =  1,2 , . . . ,  T,  and 


***)  =  vr  T  I 

Es=i  P 


T 

Rj(t)  =  Rj(t)  - 


t= 1 


If  we  introduce  the  variables, 


y{t)  =  '^2xjRj(t),  t  =  l,2,...,T, 

3 

then  we  get  the  following  separable  version  of  the  Markowitz  model: 

T 

maximize  XjKRj  —  [i  E  p(t)y(t)2 

j  t= 1 

subject  to  Xj  =  1 

3 

Xj  >  0  j  =  1,  2, . . . ,  n. 

Using  specific  data  involving  500  possible  investments  and  20  historical  time  peri¬ 
ods,  the  separable  version  solves  60  times  faster  than  the  nonseparable  version  using 
a  QP-solver  called  LOQO. 


Exercises 

24.1  Show  that  the  gradient  of  the  function 

l 

f{x)  =  -xtQx 


24.2 


is  given  by 


V/(x)  =  Qx. 

Suppose  that  Q  is  an  n  x  n  matrix  that  is  not  necessarily  symmetric.  Let 
Q  =  J,  (Q  +  QT ).  Show  that 
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(a)  xT  Qx  =  xTQx ,  for  every  x  E  Mn,  and 

(b)  Q  is  symmetric. 

24.3  Penalty  Methods. 

(a)  Consider  the  following  problem: 

minimize  \ xTQx 
subject  to  Ax  =  6, 

where  Q  is  symmetric,  positive  semidefinite,  and  invertible  (these 
last  two  conditions  are  equivalent  to  saying  that  Q  is  positive  def¬ 
inite).  By  solving  the  first-order  optimality  conditions,  give  an  ex¬ 
plicit  formula  for  the  solution  to  this  problem. 

(b)  Each  equality  constraint  in  the  above  problem  can  be  replaced  by  a 
penalty  term  added  to  the  objective  function.  Penalty  terms  should  be 
small  when  the  associated  constraint  is  satisfied  and  become  rapidly 
larger  as  it  becomes  more  and  more  violated.  One  choice  of  penalty 
function  is  the  quadratic  function.  The  quadratic  penalty  problem  is 
defined  as  follows: 

1  A 

minimize  -xT Qx  H — (b  —  Ax)T {b  —  Ax ), 

where  A  is  a  large  real- valued  parameter.  Derive  an  explicit  formula 
for  the  solution  to  this  problem. 

Show  that,  in  the  limit  as  A  tends  to  infinity,  the  solution  to  the 
quadratic  penalty  problem  converges  to  the  solution  to  the  original 
problem. 


(c) 


24.4  Consider  a  convex  quadratic  programming  problem.  Suppose  that  (x* ,  w* ) 
is  a  feasible  solution  for  the  primal  and  that  (x*,y*,z*)  is  a  feasible 
solution  for  the  dual.  Suppose  further  that  these  solutions  make  inequal¬ 
ity  (24.4)  into  an  equality.  Show  that  the  dual  solution  is  optimal  for  the 
dual  problem. 

24.5  A  real- valued  function  /  defined  on  Mn  is  called  convex  if,  for  every  x,  y  G 
Mn,  and  for  every  0  <  t  <  1, 

f(tx  +  (1  -  t)y)  <  tf(x)  +  (1  -  t)f(y). 


Show  that  the  function 

l 

f(x)  =  cT x  -f  - xT Qx ,  x  £  Mn, 

is  convex  if  Q  is  positive  semidefinite. 

24.6  Extend  the  convergence  analysis  given  in  Chapter  18  so  that  it  applies  to 
convex  quadratic  programming  problems,  and  identify  in  particular  any 
steps  that  depend  on  Q  being  positive  semidefinite. 
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24.7  Consider  the  quadratic  programming  problem  given  in  the  following  form: 

minimize  cTx  +  \ xTQx 
subject  to  Ax  >  6, 

(i.e.,  without  assuming  nonnegativity  of  the  x  vector).  Show  that  the 
formulas  for  the  step  directions  Ax  and  Ay  are  given  by  the  following 
reduced  KKT  system: 

c  —  ATy  +  Qx 
b  —  Ax  +  i±Y~xe 


(24.8) 


-Q 

A 


AT 

icy-1 


Ax 

Ay 

Notes 

The  portfolio  optimization  model  presented  in  Section  24.1  was  first  introduced 
by  Markowitz  (1959).  He  received  the  1990  Nobel  Prize  in  Economics  for  this  work. 

Quadratic  programming  is  the  simplest  class  of  problems  from  the  subject 
called  nonlinear  programming.  Two  excellent  recent  texts  that  cover  nonlinear  pro¬ 
gramming  are  those  by  Bertsekas  (1995)  and  Nash  and  Sofer  (1996).  The  first  paper 
that  extended  the  path-following  method  to  quadratic  programming  was  Monteiro 
and  Adler  (1989).  The  presentation  given  here  follows  Vanderbei  (1999). 


CHAPTER  25 


Convex  Programming 


In  the  last  chapter,  we  saw  that  small  modifications  to  the  primal-dual 
interior-point  algorithm  allow  it  to  be  applied  to  quadratic  programming  problems 
as  long  as  the  quadratic  objective  function  is  convex.  In  this  chapter,  we  shall  go 
further  and  allow  the  objective  function  to  be  a  general  (smooth)  convex  function. 
In  addition,  we  shall  allow  the  feasible  region  to  be  any  convex  set  given  by  a  finite 
collection  of  convex  inequalities. 


1.  Differentiable  Functions  and  Taylor  Approximations 


In  this  chapter,  all  nonlinear  functions  will  be  assumed  to  be  twice  differen¬ 
tiable,  and  the  second  derivatives  will  be  assumed  continuous.  We  begin  by  reiter¬ 
ating  a  few  definitions  and  results  that  were  briefly  touched  on  in  Chapter  17.  First 
of  all,  given  a  real-valued  function  /  defined  on  a  domain  in  Mn,  the  vector 

(x) 


V/0 E) 


9g} 


9f 

dXn. 


0) 


is  called  the  gradient  of  /  at  x.  The  matrix 


Hf{x) 


d2f 


dx\ 
d2 


f 


dx^dx 


~(x) 


dx\dx2 
d2f 


{x) 


a2/ 

dxndx 


7  60 


dx\ 


d2f 


6) 


dxndx2 


0) 


d2f 


dx\dxn 
dx2dxn  (X) 


d2f 

dxl 


(x) 


is  called  the  Hessian  of  /  at  x.  In  dimensions  greater  than  one,  the  gradient  and  the 
Hessian  are  the  analogues  of  the  first  and  second  derivatives  of  a  function  in  one 
dimension.  In  particular,  they  appear  in  the  three-term  Taylor  series  expansion  of  / 
about  the  point  x : 

l 

f(x  +  Ax)  =  f(x)  +  \7 f(x)T  Ax  H — Axt H  f{pc)Ax  +  rx  (Ax). 

The  last  term  is  called  the  remainder  term.  The  value  of  this  expansion  lies  in  the 
fact  that  this  remainder  is  small  when  Ax  is  small.  To  be  precise,  the  remainder  has 
the  following  property: 
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lim 

Acc— 


rx(Ax) 


=  0. 


||  Ax||2 

This  result  follows  immediately  from  the  one-dimensional  three-term  Taylor  series 
expansion  applied  to  g(t)  =  f(x  +  tAx)  and  the  chain  rule  (see  Exercise  25.8). 


2.  Convex  and  Concave  Functions 

There  are  several  equivalent  definitions  of  convexity  of  a  function.  The  defini¬ 
tion  that  is  most  expedient  for  our  purposes  is  the  multidimensional  generalization 
of  the  statement  that  a  function  is  convex  if  its  second  derivative  is  nonnegative. 
Hence,  we  say  that  a  real-valued  function  defined  on  a  domain  in  Mn  is  convex  if 
its  Hessian  is  positive  semidefinite  everywhere  in  its  domain.  A  function  is  called 
concave  if  its  negation  is  convex. 


3.  Problem  Formulation 

We  shall  study  convex  optimization  problems  posed  in  the  following  form: 

minimize  c(x) 

subject  to  ca(x)  >  bi ,  i  =  1,  2, . . . ,  ra. 

Here,  the  real-valued  function  c(-)  is  assumed  to  be  convex,  and  the  m  real- valued 
functions  a^(-)  are  assumed  to  be  concave.  This  formulation  is  the  natural  exten¬ 
sion  of  the  convex  quadratic  programming  problem  studied  in  the  previous  chapter, 
except  that  we  have  omitted  the  nonnegativity  constraints  on  the  variables.  This 
omission  is  only  a  matter  of  convenience  since,  if  a  given  problem  involves  non¬ 
negative  variables,  the  assertion  of  their  nonnegativity  can  be  incorporated  as  part 
of  the  m  nonlinear  inequality  constraints.  Also  note  that  once  we  allow  for  general 
concave  inequality  constraints,  we  can  take  the  right-hand  sides  to  be  zero  by  sim¬ 
ply  incorporating  appropriate  shifts  into  the  nonlinear  constraint  functions.  Hence, 
many  texts  on  convex  optimization  prefer  to  formulate  the  constraints  in  the  form 
ai(x)  >  0.  We  have  left  the  constants  bi  on  the  right-hand  side  for  later  comparisons 
with  the  quadratic  programming  problem  of  the  previous  chapter.  Finally,  note  that 
many  convex  and  concave  functions  become  infinite  in  places  and  therefore  have  a 
natural  domain  that  is  a  strict  subset  of  Mn.  This  issue  is  important  to  address  when 
solving  practical  problems,  but  since  this  chapter  is  just  an  introduction  to  convex 
optimization,  we  shall  assume  that  all  functions  are  finite  on  all  of  Mn. 

At  times  it  will  be  convenient  to  use  vector  notation  to  consolidate  the  m  con¬ 
straints  into  a  single  inequality.  Hence,  we  sometimes  express  the  problem  as 

minimize  c(x) 
subject  to  A(x)  >  b , 

where  A(-)  is  a  function  from  Mn  into  Mm  and  b  is  a  vector  in  Mm.  As  usual,  we  let 
w  denote  the  slack  variables  that  convert  the  inequality  constraints  to  equalities: 

minimize  c(x) 
subject  to  A(x)  —  w  =  b 

w  >  0. 
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4.  Solution  via  Interior-Point  Methods 

In  this  section,  we  derive  an  interior-point  method  for  convex  programming 
problems.  We  start  by  introducing  the  associated  barrier  problem: 

minimize  c(x)  —  y  log 

subject  to  di(x)  —  Wi  =  bi,  i  =  1,  2, . . . ,  m. 

The  Lagrangian  for  this  problem  is  given  by 


L(x,w,y )  =  c(x)  -  \ogWi  +  y ^yj(bj  -  a^x)  +  wt ) 


Equating  to  zero  the  derivative  of  L  with  respect  to  each  of  its  variables,  we  get  the 
following  set  of  first-order  optimality  conditions: 


dL 


dc 


dx ,  Ox 


dL 

dwi 

dL_ 

dyi 


3 

ll 


Wi 


C * )  2  Vi  QX  .  (X) 

i  ^ 

=  0, 

3  =  1,2,.. 

'  +  Vi 

=  0, 

i  =  1,  2, . . 

di(x)  +Wi 

=  0, 

i  =  1,2,.. 

The  next  step  is  to  multiply  the  ith  equation  in  the  middle  set  by  Wi  and  then  replace 
x  with  x  +  Ax ,  y  by  y  +  Ay,  and  w  by  w  +  Aw  to  get  the  following  system: 


—{x  +  Ax)  -  +  Ayi)o£(x  +  Ax)  =  °> 

3  i  3 

— M  +  (wi  +  A  Wi)(yi  +  Aj/j)  =  0, 
bi  —  a>i(x  +  Ax)  +  Wi  +  Awi  =  0, 


j  =  1,2, . . .  ,n, 

i  =  1,  2, . . . ,  m, 
i  —  1,  2, . . . ,  m. 


Now  we  view  this  set  of  equations  as  a  nonlinear  system  in  the  “delta”  variables 
and  linearize  it  by  replacing  each  nonlinear  function  with  its  two-term  Taylor  series 
approximation.  For  example,  d c/dxj(x  +  Ax)  gets  replaced  with 


dc 

dxj 


(x  +  Ax) 


dc 
dx , 


d2 


c 


k 


dxjdxk 


(x)Axk. 


Similarly,  dai  / dx.,  (x  +  Ax)  gets  replaced  with 


daj 

dxj 


(x  +  Ax) 


daj 

dxj 


(U  +  52 

k 


Xai 

dxjdxk 


(x)Axfc. 
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Writing  the  resulting  linear  system  with  the  delta-variable  terms  on  the  left  and 
everything  else  on  the  right,  we  get 


bi-CLi- f-  Wi. 


(Note  that  we  have  omitted  the  indication  that  the  functions  c,  a*,  and  their  deriva¬ 
tives  are  to  be  evaluated  at  x.) 

As  usual,  the  next  step  is  to  solve  the  middle  set  of  equations  for  the  Awi’s  and 
then  to  eliminate  them  from  the  system.  The  reduced  system  then  becomes 


and  the  equations  for  the  Awi’s  are 

A  Wi  =  —  —  Ayi  +  —  —  Wi,  i  =  1,2,...,  m. 

Vi  Vi 

At  this  point  it  is  convenient  to  put  the  equations  into  matrix  form.  If  we  generalize 
our  familiar  gradient  notation  by  letting  V  A(x)  denote  the  m  x  n  matrix  whose 
(i,j) th  entry  is  ddi/dxj(x ),  then  we  can  write  the  above  system  succinctly  as  fol¬ 
lows: 

(25.1) 


—Hc(x)  +  J2iViHai(x) 

VA(i)T  " 

Ax 

Vc(x)  —  VA(x)Ty 

\7A{x) 

WY~X 

.  Ay . 

b  —  A(x)  H-  yY~1e 

Now  that  we  have  step  directions,  the  algorithm  is  easy  to  describe — just  com¬ 
pute  step  lengths  that  preserve  strict  positivity  of  the  wi  s  and  the  yi  s,  step  to  a  new 
point,  and  iterate. 


5.  Successive  Quadratic  Approximations 

It  is  instructive  to  notice  the  similarity  between  the  system  given  above  and  the 
analogous  system  for  the  quadratic  programming  problem  posed  in  the  analogous 
form  (see  Exercise  24.7).  Indeed,  a  careful  matching  of  terms  reveals  that  the  step 
directions  derived  here  are  exactly  those  that  would  be  obtained  if  one  were  to  form 
a  certain  quadratic  approximation  at  the  beginning  of  each  iteration  of  the  interior- 
point  algorithm.  Hence,  the  interior-point  method  can  be  thought  of  as  a  successive 
quadratic  programming  algorithm.  In  order  to  write  this  quadratic  approximation 
neatly,  let  x  and  y  denote  the  current  primal  and  dual  variables,  respectively.  Then 
the  quadratic  approximation  can  be  written  as 
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minimize  c(x)  +  Vc(t)t(t  —  x)  +  ^(x  —  x)T  Hc(x)(x  —  x) 

-\{x  -  x)T  VzHai(x))  (x  -  x) 
subject  to  A(x)  +  V  A(x)(x  —  x)  >  b. 

To  verify  the  equivalence,  we  first  observe  that  this  problem  is  a  quadratic  program 
whose  linear  objective  coefficients  are  given  by 


Vc(x)  —  Hc(x)x  + 


whose  quadratic  objective  coefficients  are  given  by 


Hc{x)  -  y ~2yiHai(x), 


and  whose  right-hand  side  vector  is  given  by 


b  —  A(pc)  +  WA(x)x. 


Substituting  these  expressions  into  the  appropriate  places  in  (24.8),  we  get  (25.1). 

Looking  at  the  quadratic  terms  in  the  objective  of  the  quadratic  programming 
approximation,  we  see  that  the  objective  is  convex,  since  we  assumed  at  that  start 
that  c  is  convex,  each  is  concave,  and  the  dual  variables  multiplying  the  Hessians 
of  the  constraint  functions  are  all  strictly  positive. 


6.  Merit  Functions 


It  is  perhaps  a  little  late  to  bring  this  up,  but  here’s  a  small  piece  of  advice: 
always  test  your  knowledge  on  the  simplest  possible  example.  With  that  in  mind, 
consider  the  following  trivial  convex  optimization  problem: 


minimize 


vTT 


or 


This  problem  has  no  constraints.  Looking  at  the  graph  of  the  objective  function, 
which  looks  like  a  smoothed  out  version  of  \x\,  we  see  that  the  optimal  solution  is 
x*  =  0.  What  could  be  easier!  There  are  no  y^s  nor  any  Wi  s  and  equation  (25.1) 
becomes  just 

—Hc(x)  Ax  =  Vc(x), 

where  c(x)  =  vT  +  x2.  Taking  the  first  and  second  derivatives,  we  get 

x  1 

Vc(x)  =  and  Hc(x )  = 


7TT 


Xd 


(1  T  X2)3/2  ’ 


Substituting  these  expressions  into  the  equation  for  Ax  and  simplifying,  we  get  that 

Ax  =  —x(l  +  x2). 

Since  there  are  no  nonnegative  variables  that  need  to  be  kept  positive,  we  can  take 
unshortened  steps.  Hence,  letting  x ^  denote  our  current  point  and  x^+1)  denote 
the  next  point,  we  have  that 

^(fe+i)  =x{k)  +A;c  =  -0r(fe))3. 
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That  is,  the  algorithm  says  to  start  at  any  point  x^  and  then  replace  this  point  with 
the  negative  of  its  cube,  replace  that  with  the  negative  of  its  cube,  and  so  on. 

The  question  is:  does  this  sequence  converge  to  zero?  It  is  easy  to  see  that 


the  answer  is  yes  if 


x 


(0) 


<  1  but  no  otherwise.  For  example,  if  we  start  with 


#(°)  =  1/2,  then  the  sequence  of  iterates  is 


k  x 

0  0.50000000 

1  -0.12500000 

2  0.00195313 

3  -0.00000001 


If,  on  the  other  hand,  we  start  at  a/°)  =  2,  then  we  get  the  following  wildly  divergent 
sequence: 

k  x ^ 

~o  T 

1  -8 

2  512 

3  -134,217,728 

Here  is  what  goes  wrong  in  this  example.  For  problems  without  constraints,  our 
algorithm  has  an  especially  simple  description: 

From  the  current  point,  use  the  first  three  terms  of  a  Taylor 
series  expansion  to  make  a  quadratic  approximation  to  the  ob¬ 
jective  function.  The  next  point  is  the  minimum  of  this  quadratic 
approximation  function. 

Figure  25.1  shows  a  graph  of  the  objective  function  together  with  the  quadratic 
approximation  at  x^  =  2.  It  is  easy  to  see  that  the  next  iterate  is  at  —8.  Also,  the 
further  from  zero  that  one  starts,  the  more  the  function  looks  like  a  straight  line  and 
hence  the  further  the  minimum  will  be  to  the  other  side. 

How  do  we  remedy  this  nonconvergence?  The  key  insight  is  the  observation 
that  the  steps  are  always  in  the  correct  direction  (i.e,  a  descent  direction)  but  they 
are  too  long — we  need  to  shorten  them.  A  standard  technique  for  shortening  steps 
in  situations  like  this  is  to  introduce  a  function  called  a  merit  function  and  to  shorten 
steps  as  needed  to  ensure  that  this  merit  function  is  always  monotonically  decreas¬ 
ing.  For  the  example  above,  and  in  fact  for  any  unconstrained  optimization  problem, 
we  can  use  the  objective  function  itself  as  the  merit  function.  But,  for  problems  with 
constraints,  one  needs  to  use  something  a  little  different  from  just  the  objective  func¬ 
tion.  For  example,  one  can  use  the  logarithmic  barrier  function  plus  a  constant  times 
the  square  of  the  Euclidean  norm  of  the  infeasibility  vector: 


w)  :=  c(x)  —  log(wi)  +  /3\\b  —  A(x)  +  w ||2. 

i 

Here,  /?  is  a  positive  real  number.  One  can  show  that  for  (3  sufficiently  large  the  step 
directions  are  always  descent  directions  for  this  merit  function. 

A  summary  of  the  algorithm  is  shown  in  Figure  25.2. 
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Figure  25.1.  The  function  c(x)  =  vT  +  x2  and  its  quadratic 
approximation  at  x  =  2. 


7.  Parting  Words 

A  story  is  never  over,  but  every  book  must  have  an  end.  So,  we  stop  here 
mindful  of  the  fact  that  there  are  many  interesting  things  left  unsaid  and  topics  un¬ 
explored.  We  hope  we  have  motivated  the  reader  to  pursue  the  story  further  without 
our  assistance — by  reading  other  books  and/or  research  papers  and  even  perhaps 
making  his  or  her  own  contributions.  Cheers. 


Exercises 


25.1  Piecewise  Linear  Approximation.  Given  real  numbers  b\  <  <  •  •  •  < 

bk,  let  /  be  a  continuous  function  on  R  that  is  linear  on  each  interval 
[bi,  bi- |_i],  i  =  0, 1, . . . ,  k  (for  convenience  we  let  60  =  —  oo  and  bk+i  = 
oo ).  Such  a  function  is  called  piecewise  linear  and  the  numbers  bi  are 
called  breakpoints.  Piecewise  linear  functions  are  often  used  to  approxi¬ 
mate  (continuous)  nonlinear  functions.  The  purpose  of  this  exercise  is  to 
show  how  and  why. 

(a)  Every  piecewise  linear  function  can  be  written  as  a  sum  of  a  constant 
plus  a  linear  term  plus  a  sum  of  absolute  value  terms: 


k 

f(x)  =  d  +  OqX  +  CLi 

i—  1 


Let  Ci  denote  the  slope  of  /  on  the  interval  [bi,bi+ 1].  Derive  an 
explicit  expression  for  each  of  the  a/s  (including  ao)  in  terms  of 
the  Ci  s. 
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initialize  (x,  w,  y )  so  that  (w,  y)  >  0 


while  (not  optimal)  { 

set  up  QP  subproblem: 

A  =  VA(x) 

b  =  b  —  A(pc)  +  VA(x)x 

c  =  Vc(x)  -  Hc(x)x  +  ViHai(x))  x 

Q  =  Hc(x)  -  JA  yiHai  (x) 

p  =  b  —  Ax  +  w 

a  =  c  —  ATy  +  Qx 

7  =  yTw 

p  =  o - 

n  +  m 

solve: 


~-Q 

AT  1 

Ax 

c  —  ATy  +  Qx 

A 

t-H 

1 

_Ay_ 

i 

1 

i—1 

Ct> 

i _ 

Aw  =  y-^/ie  -  YWe  -  WAy) 


0 


=  r  I 


do  { 


maxy  <  — 


A  Xj  A  Wi  A  yi 


A  1 


x, 


Wa 


Vi 


x 


new 


=  x  +  9  Ax, 


wnew  =w  +  6  Aw 


ynew  =  y  +  OAy 

9  <T-  9/2 

} while  ( '3/ (xnew ,  wnew)  >  V(x,  w)  ) 

} 


Figure  25.2.  The  path-following  method  for  convex  program¬ 
ming  problems. 


(b) 

(c) 

(d) 


In  terms  of  the  q’s,  give  necessary  and  sufficient  conditions  for  /  to 
be  convex. 

In  terms  of  the  a/s,  give  necessary  and  sufficient  conditions  for  /  to 
be  convex. 

Assuming  that  /  is  convex  and  is  a  term  in  the  objective  function 
for  a  linearly  constrained  optimization  problem,  derive  an  equivalent 
linear  programming  formulation  involving  at  most  k  extra  variables 
and  constraints. 

Repeat  the  first  four  parts  of  this  problem  using  max(x  —  6^,0)  in 
place  of  x  —  bi 


25.2  Let  /  be  the  function  of  2  real  variables  defined  by 

f(x,y )  =  x2  -2xy  +  y2. 

Show  that  /  is  convex. 
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25.3  A  function  /  of  2  real  variables  is  called  a  monomial  if  it  has  the  form 

f(x,y)  =  xmyn 

for  some  nonnegative  integers  m  and  n.  Which  monomials  are  convex? 

25.4  Let  0  be  a  convex  function  of  a  single  real  variable.  Let  /  be  a  function 
defined  on  Mn  by  the  formula 

f(x)  =  cj)(aTx  +  6), 

where  a  is  an  n-vector  and  b  is  a  scalar.  Show  that  /  is  convex. 

25.5  Which  of  the  following  functions  are  convex  (assume  that  the  domain  of 
the  function  is  all  of  Mn  unless  specified  otherwise)? 

(a)  4x2  —  12 xy  +  9 y2 

(b)  x2  +  2 xy  +  y 2 

(c)  x2y2 

(d)  x2  —  y 2 

(e)  ex~y 

(f)  ex2~y2 

(g)  y  on  {(x,y)  :  y  >  0} 

25.6  Given  a  symmetric  square  matrix  A,  the  quadratic  form  xTAx  = 
aijXiXj  generalizes  the  notion  of  the  square  of  a  variable.  The  generaliza¬ 
tion  of  the  notion  of  the  fourth  power  of  a  variable  is  an  expression  of  the 
form 

/(x)  —  ^  ^  j%k%l  • 

The  four-dimensional  array  of  numbers  A  =  {aijki  :  1  <  i  <  n,  1  < 
j  <  n,  1  <  k  <  n,  1  <  l  <n}is  called  a  4-tensor.  As  with  quadratic 
expressions,  we  may  assume  that  A  is  symmetric: 


^ ijkl  tljkli  '  dlkij 

(i.e.,  given  i,  j,  fc,  /,  all  4!  =  24  premutations  must  give  the  same  value  for 
the  tensor). 

(a)  Give  conditions  on  the  4-tensor  A  to  guarantee  that  /  is  convex. 

(b)  Suppose  that  some  variables,  say  y^ s,  are  related  to  some  other  vari¬ 
ables,  say  x/ s,  in  a  linear  fashion: 


Vi  = 


i 


Express  ^  in  terms  of  the  x/s.  In  particular,  give  an  explicit 
expression  for  the  4-tensor  and  show  that  it  satisfies  the  conditions 
derived  in  part  (a). 


25.7  Consider  the  problem 

minimize  ax\  A  x 2 
subject  to  a/c2  +  x\  <  X2- 
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where  —  1  <  a  <  1. 

(a)  Graph  the  feasible  set:  j(xi,X2)  :  y/e2  +  x\  <  Is  the  problem 
convex? 

(b)  Following  the  steps  in  the  middle  of  p.  391  of  the  text,  write  down  the 
first-order  optimality  conditions  for  the  barrier  problem  associated 
with  barrier  parameter  [i  >  0. 

(c)  Solve  explicitly  the  first-order  optimality  conditions.  Let  (x\ (/x), 
X2 (//))  denote  the  solution. 

(d)  Graph  the  central  path,  (x\ (/x),  X2(/x)),  as  /x  varies  from  0  to  oo. 


25.8  Multidimensional  Taylor’s  series  expansion.  Given  a  function  g(t) 
defined  for  real  values  of  t,  the  three-term  Taylor’s  series  expansion  with 
remainder  is 

g(t  +  At)  =  g{t)  +  g'(t)At  +  1 g"(t)At 2  +  rt(A<). 

The  remainder  term  satisfies 


lim 

/At — ^0 


n(Ai) 

At2 


=  0. 


Let  /  be  a  smooth  function  defined  on  Mn.  Apply  the  three-term  Taylor’s 
series  expansion  to  g(t)  =  f(x  +  tAx)  to  show  that 


f(x  +  Ax)  =  /(x)  +  V/(x)T  Ax  +  -Ax1  Hf(x)  Ax  +  (Ax). 

A 


1 


T 


25.9  Consider  the  following  convex  programming  problem: 

minimize  X2 
subject  to  x\  +  x|  <  1. 

(a)  Find  the  quadratic  subproblem  if  the  current  primal  solution  is 
(xi,  X2)  =  (1/2,  —2/3)  and  the  current  dual  solution  is  y  =  2. 

(b)  Show  that  for  arbitrary  current  primal  and  dual  solutions,  the  feasi¬ 
ble  set  for  the  convex  programming  problem  is  contained  within  the 
feasible  set  for  the  quadratic  approximation. 


Notes 

Interior-point  methods  for  nonlinear  programming  can  be  traced  back  to  the 
pioneering  work  of  Fiacco  and  McCormick  (1968).  For  more  on  interior-point  meth¬ 
ods  for  convex  programming,  see  Nesterov  and  Nemirovsky  (1993)  or  den  Hertog 
(1994). 

The  fact  that  the  step  directions  are  descent  directions  for  the  merit  function  4/ 
is  proved  in  Vanderbei  and  Shanno  (1999). 


ERRATUM 


Linear  Programming 


Robert  J.  Vanderbei 


R .J.  Vanderbei,  Linear  Programming ,  International  Series  in  Operations  Research 
&  Management  Science  196,  DOI  10.1007/978-1-4614-7630-6, 

©  Springer  Science+Business  Media  New  York  2014 


DOI  10.1007/978-1-4614-7630-6  26 

The  publisher  regrets  the  errors  published  in  the  print  and  online  versions  of  this 
book,  corrections  to  Chapter  3,  page  29,  and  Chapter  14,  page  211,  have  been 
updated  and  can  be  found  on  the  next  pages. 


The  updated  original  online  version  for  this  book  can  be  found  at  DOI 
10.1007/978-1-4614-7630-6 


R.J.  Vanderbei,  Linear  Programming ,  International  Series  in  Operations  Research 
&  Management  Science  196,  DOI  10.1007/978-l-4614-7630-6_26, 

©  Springer  Science+Business  Media  New  York  2017 


El 


E2 

Chapter  3 
Degeneracy 

Page  29,  the  below  three  display  equations  were  wrong 

C  =  6  x\  —  4  X2 

w\  =  0  4-  9  x\  +  4  X2 
W2  =  0  —  4  X'i  —  2  .X‘2 

W3  =  1  —  X2  • 


(  =  6  .x‘i  —  4  ^2 

rtq  =  0  4“  6]_  4~  9  X\  4-  4 

W2  =  0  +  62  —  4  X'l  —  2  X2 

w3  =  1  +  e3  -  X2- 


c= 

—  1.5  W2  4-  X2 

W\  —  0  4~  6i  4~  2.25  62 

—  2.25  W2  —  0.5  X2 

x\  —  0 

4-0.25  e2 

—  0.25  W2  —  0.5  X2 

W3  =  1 

4- 

63  _  ^2- 

should  be  replaced  by 


(  =  6  x\  +  4  ^2 

uq  =  0  +  9  xi  +  4  £2 
W2  =  0  —  4  .X‘i  —  2  X2 
1^3  =  1  —  X2 . 


C  =  6  xi  +  4  ^2 

W\  =  0  4-  6i  4“  9  X\  -(“  4  a^2 

W2  =  0  4-62  —  4  X'l  —  2  X2 

w3  =  1  4-  e3  -  x2. 


c 


w3 


0  4-  ei  4-2. 
0  +0. 
1 


1.5  W2  4-  X2 
2.25  W2  —  0.5  X2 
0.25  W2  —  0.5  X2 
-  x2. 


The  updated  original  online  version  for  this  chapter  can  be  found  at  DOI 
10.1007/978-l-4614-7630-6_3 
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Chapter  14 

Network  Flow  Problems 

Page  211,  “Hence,  arc  (f,c)  must  enter  the  spanning  tree”  should  read  “Hence, 
arc  (f,b)  must  enter  the  spanning  tree”. 


The  updated  original  online  version  for  this  chapter  can  be  found  at  DOI 
10.1007/978-l-4614-7630-6_14 


APPENDIX  A 


Source  Listings 


The  algorithms  presented  in  this  book  have  all  been  implemented  and  are  pub¬ 
licly  available  from  the  author’s  web  site: 

http://www.princeton.edu/^rvdb/LPbook/ 

There  are  two  variants  of  the  simplex  method:  the  two-phase  method  as  shown 
in  Figure  6.1  and  the  self-dual  method  as  shown  in  Figure  7.1.  The  simplex  codes 
require  software  for  efficiently  solving  basis  systems.  There  are  two  options:  the  eta- 
matrix  approach  described  in  Section  8.3  and  the  refactorization  approach  described 
in  Section  8.5.  Each  of  these  “engines”  can  be  used  with  either  simplex  method. 
Hence,  there  are  in  total  four  possible  simplex  codes  that  one  can  experiment  with. 

There  are  three  variants  of  interior-point  methods:  the  path-following  method 
as  shown  in  Figure  18.1,  the  homogeneous  self-dual  method  shown  in  Figure  22.1 
(modified  to  take  long  steps),  and  the  long-step  homogeneous  self-dual  method  de¬ 
scribed  in  Exercise  22.4  of  Chapter  22. 

The  source  code  that  implements  the  algorithms  mentioned  above  share  as 
much  common  code  as  possible.  For  example,  they  all  share  the  same  input  and 
output  routines  (the  input  routine,  by  itself,  is  a  substantial  piece  of  code).  They 
also  share  code  for  the  common  linear  algebra  functions.  Therefore,  the  difference 
between  two  methods  is  limited  primarily  to  the  specific  function  that  implements 
the  method  itself. 

The  total  number  of  lines  of  code  used  to  implement  all  of  the  algorithms  is 
about  9,000.  That  is  too  many  lines  to  reproduce  all  of  the  code  here.  But  the 
routines  that  actually  lay  out  the  particular  algorithms  are  fairly  short,  only  about 
300  lines  each.  The  relevant  part  of  the  self-dual  simplex  method  is  shown  starting 
on  the  next  page.  It  is  followed  by  a  listing  of  the  relevant  part  of  the  homogeneous 
self-dual  method. 
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A.  SOURCE  LISTINGS 


1.  The  Self-Dual  Simplex  Method 


*  Main  loop  * 

It'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k/ 


for  (iter=0;  iter<MAX_ITER;  iter++)  { 


A 

*  STEP  1 :  Find  mu 


* 

7 


mu  = 

col 

for 


=  -HUGE_VAL; 

_in  =  -1; 

( j  =0 ;  j  <n ;  j+  +  )  { 

if  (zbar_N[j]  >  EPS2)  { 

if  (  mu  <  - z_N [ j ] /zbar_N [ j ] 
mu  =  - z_N [ j ] /zbar_N [ j ] ; 
col_in  =  j ; 


)  { 


} 

col_ 

for 


} 


} 


out  =  - 1 ; 

( i  =  0 ;  i<m;  i  +  +  )  { 

if  (xbar_B [i]  >  EPS2)  { 

if  (  mu  <  -x_B [i] /xbar_B [i] 
mu  =  -x_B [i] /xbar_B [i] ; 
col_out  =  i ; 
col  in  =  -1; 


)  { 


} 

if 


} 


} 

(  mu  <=  EPS3 
status  =  0; 
break ; 


} 


)  { 


/*  OPTIMAL  */ 


if  (  col_out  >=  0  )  { 


A 

* 


*  STEP  2 :  Compute  dz 

*  N 

*  where  i  =  col  out 


-1  T 

=  -  (B  N)  e 


* 

* 

* 

* 

7 


vec  [0]  =  -1.0; 
ivec[0]  =  col_out ; 
nvec  =  1; 

btsolve (  m,  vec,  ivec,  &nvec  ); 

Nt_times_z (  N,  at,  iat,  kat,  basicflag,  vec,  ivec,  nvec, 
dz_N,  idz_N,  &ndz_N  ) ; 

/'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

*  STEP  3 :  Ratio  test  to  find  entering  column  * 


col_in  =  ratio_test (  dz_N,  idz_N,  ndz_N,  z_N,  zbar_N,  mu  ) ; 

if  (col_in  ==  -1)  {  /*  INFEASIBLE  */ 

status  =  2 ; 
break ; 

} 

I  •k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 


*  -  1  * 

*  STEP  4 :  Compute  dx  =  B  N  e  * 

*  B  j  * 

*  * 


■Jr************************************************************/ 


j  =  nonbasics [col_in] ; 
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for  (i  =  0,  k=ka[j]  ;  k<ka[j+l]  ;  i  +  +,  k+  +  )  { 

dx_B  [i]  =  a  [k]  ; 

idx_B  [i]  =  ia  [k]  ; 

} 

ndx  B  =  i ; 


bsolve (  m,  dx_B,  idx_B,  &ndx_B  ) ; 
}  else  { 


/ 


* 


-1 


* 


*  STEP  2 :  Compute  dx  =  B  N  e  * 

*  B  j  * 

it'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k/ 


j  =  nonbasics [col_in] ; 

for  (i  =  0,  k=ka[j]  ;  k<ka[j+l]  ;  i  +  +,  k+  +  )  { 

dx_B  [i]  =  a  [k]  ; 

idx_B  [i]  =  ia  [k]  ; 

} 

ndx  B  =  i ; 


bsolve (  m,  dx_B,  idx_B,  &ndx_B  ) ; 

*  STEP  3 :  Ratio  test  to  find  leaving  column  * 


col_out  =  ratio_test (  dx_B,  idx_B,  ndx_B,  x_B,  xbar_B,  mu  ) ; 

if  ( col _ out  ==  -1)  {  /*  UNBOUNDED  */ 

status  =  1; 
break ; 

} 


/  icieieieieieie&&&&icieieieie'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

_  *1' 

*  STEP  4:  Compute  dz  =  - (B  N)  e  * 

*  N  i  * 


* 


* 


/ 


vec  [0]  =  -1.0; 
ivec[0]  =  col_out ; 
nvec  =  1; 

btsolve (  m,  vec,  ivec,  &nvec  ); 


Nt_times_z (  N,  at,  iat,  kat,  basicflag,  vec,  ivec,  nvec, 
dz_N,  idz_N,  &ndz_N  ) ; 


} 


/ 


* 

*  STEP  5 :  Put 

* 


t  =  x  /dx 
i  i 


* 

* 

* 


* 

* 

* 

* 

* 

* 


t  =  x  / dx 
i  i 
s  =  z  /dz 

j  j 


* 

* 

* 

* 

* 

* 


*  s  =  z  /dz  * 

*  j  j  * 


for  (k=0;  k<ndx_B;  k++)  if  (idx_B[k]  ==  col_out)  break; 

t  =  x_B [col_out] /dx_B [k] ; 
tbar  =  xbar_B [col_out] / dx_B [k] ; 

for  (k=0 ;  k<ndz_N;  k++)  if  (idz_N[k]  ==  col_in)  break; 


s 


z  N[col  in]/dz  N  [k]  ; 
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sbar  =  zbar_N [col  in] /dz_N [k] ; 


* 

*  STEP  7:  Set 

* 

* 

* 

* 


z  =  z  -  s  dz 
N  N  N 

z  =  s 
i 


z  =  z  -  s  dz 
N  N  N 

z  =  s 
i 


* 

* 

* 

* 

* 

* 


* 

*  x  =  x  -  t  dx  x  =  x  -  t  dx 

*  B  B  B  B  B  B 


* 

* 

* 


x 


X 


* 

* 

* 


for  (k=0;  k<ndz_N;  k++)  { 

j  =  idz_N [k] ; 

z_N[j]  -=  s  *dz_N[k]; 
zbar_N[j]  -=  sbar*dz_N [k] ; 

} 

z_N[col_in]  =  s; 
zbar_N [col_in]  =  sbar; 

for  (k=0;  k<ndx_B;  k++)  { 

i  =  idx_B [k] ; 

x_B  [  i ]  -  =  t  *  dx_B  [  k ]  ; 

xbar_B  [i]  -=  tbar*dx_B [k] ; 

} 

X_B [col_OUt]  =  t; 

xbar_B [col_out]  =  tbar; 

/•k'k'k-k'k-k-k-k'k'k-k-k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k-k-k-k-k-k-k-k-k-k-k 

*  STEP  8 :  Update  basis  * 

'k-k'k'k'k'k'k'k'k'k'k'k'k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k-k-k-k-k-k-k-k-k-k/ 

i  =  basics [col_out] ; 
j  =  nonbasics [col_in]  ; 
basics [col_out]  =  j ; 
nonbasics [col_in]  =  i; 
basicflag[i]  =  -col_in-l; 
basicflag[j]  =  col_out; 

/************************************************************* 

*  STEP  9:  Refactor  basis  and  print  statistics  * 

*************************************************************/ 
from_scratch  =  refactor (  m,  ka,  ia,  a,  basics,  col_out,  v  ); 
if  ( f rom_scratch)  { 

primal_obj  =  sdotprod (c , x_B , basics , m)  +  f; 
printf("%8d  %14 . 7e  %9.2e  \n" ,  iter,  primal_obj ,  mu  ); 
f flush (stdout) ; 

} 


} 


2.  THE  HOMOGENEOUS  SELF-DUAL  METHOD 
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2.  The  Homogeneous  Self-Dual  Method 


/**************************************************************** 

*  Main  loop  * 

'k-k-k-k-k'k'k-k-k-k-k-k-k-k-k'k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k-k-k-k-k-k-k-k-k-k-k/ 

for  (iter=0;  iter<MAX_ITER ;  iter++)  { 

/************************************************************* 
*  STEP  1:  Compute  mu  and  centering  parameter  delta. 

'k-k-k-k-k-k-k-k-k-k'k-k'k-k-k-k'k'k'k'k'k'k'k'k-k'k-k-k-k-k-k-k-k-k-k-k-k-k'k'k-k-k'k-k'k'k'k'k-k'k-k-k-k-k-k-k'k-k-k-k'k/ 


mu  =  (dotprod ( z , x, n) +dotprod (w, y, m) +phi*psi )  /  (n+m+1) ; 

if  (iter%2  ==  0)  { 

delta  =  0.0; 

}  else  { 

delta  =  1.0; 

} 


/ ************************************************************* 

*  STEP  1:  Compute  primal  and  dual  objective  function  values. 

*************************************************************/ 

primal_obj  =  dotprod (c , x, n) ; 
dual_obj  =  dotprod (b, y, m) ; 

/  ************************************************************* 

*  STEP  2:  Check  stopping  rule. 

*************************************************************/ 


(  mu  <  EPS  )  { 

if  (  phi  >  EPS  ) 

-  { 

status  =  0; 
break ; 

\ 

/  * 

OPTIMAL  */ 

/ 

else 

if  (  dual  obj  < 

o .  o; 

»  { 

status  =  2; 
break ; 

/  * 

PRIMAL  INFEASIBLE  */ 

i 

else 

if  (  primal  obj 

>  0 

.0)  { 

status  =  4; 
break ; 

/  * 

DUAL  INFEASIBLE  */ 

} 


else 

{ 

Status  =  7;  /*  NUMERICAL  TROUBLE  */ 

break ; 

} 

} 

/************************************************************* 
*  STEP  3:  Compute  infeasibilities. 

'k'k'k'k-k-k-k-k'k'k'k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k'k'k'k'k'k'k'k'k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k'k'k'k-k-k-k/ 


smx ( m , n , A , kA , i A , x , rho ) ; 
for  (i=0;  i <m ;  i++)  { 

rho[i]  =  rho[i]  -  b[i]*phi  +  w[i]; 

} 

normr  =  sqrt (  dotprod (rho, rho, m)  )/phi; 
for  (i=0;  i <m ;  i++)  { 

rho[i]  =  -  (1-delta)  *rho  [i]  +  w[i]  -  delta*mu/y  [i]  ; 

} 


smx  (n,  m.  At ,  kAt ,  iAt ,  y ,  sigma)  ; 
for  ( j  =  0 ;  j  <n ;  j+  +  )  { 

sigma  [j]  =  -sigma  [j]  +  c[j]*phi  +  z[j]; 

} 

norms  =  sqrt (  dotprod ( sigma , sigma , n)  ) /phi ; 
for  ( j  =0 ;  j  <n ;  j+  +  )  { 

sigma [j]  =  - (1-delta) *sigma [j ]  +  z [ j ]  -  delta*mu/x [ j ] ; 

} 
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gamma  =  - (1-delta) * (dual_obj  -  primal_obj  +  psi)  +  psi  -  delta*mu/phi ; 

/'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'kic'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

*  Print  statistics. 


printf ( " %8d  %14.7e  %8 . le  %14.7e  %8 . le  %8.1e  \n" , 

iter,  primal_obj /phi+f ,  normr, 

dual_ob j /phi+f ,  norms,  mu  ); 
f flush (stdout) ; 


*  STEP  4:  Compute  step  directions. 

it'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k/ 

for  ( j  =0 ;  j  <n ;  j+  +  )  {  D[j]  =  z[j]/x[j];  } 

for  ( i  =  0 ;  i<m;  i  +  +  )  {  E[i]  =  w[i]/y[i];  } 


ldltfac (n,  m,  kAt ,  iAt ,  At,  E,  D,  kA,  iA,  A,  v) ; 

for  ( j  =  0 ;  j  <n ;  j+  +  )  {  fx[j]  =  -sigma  [j];  } 
for  ( i  =  0 ;  i<m;  i  +  +  )  {  fy[i]  =  rho[i];  } 


f orwardbackward (E,  D,  fy,  fx)  ; 

for  ( j  =0  ;  j  <n ;  j+  +  )  {  gx[j]  =  -  c  [  j  ]  ;  } 
for  ( i  =  0 ;  i<m;  i  +  +  )  {  gy[i]  =  -b[i];  } 


f orwardbackward (E,  D,  gy,  gx)  ; 


dphi  =  (dotprod ( c , f x, n) -dotprod (b , fy , m) +gamma) / 

(dotprod (c, gx, n) -dotprod (b , gy, m) -psi/phi) ; 


for  ( j  =0  ;  j  <n ;  j+  +  )  {  dx[j] 

for  ( i  =  0 ;  i<m;  i  +  +  )  {  dy[i] 

for  ( j  =0  ;  j  <n ;  j+  +  )  {  dz[j] 

for  ( i  =  0 ;  i<m;  i  +  +  )  {  dw[i] 

dpsi  =  delta*mu/phi  -  psi  - 


=  fx[j]  -  gx  [  j  ]  *dphi  ;  } 

=  fy[i]  -  gy[i]*dphi;  } 

=  delta*mu/x [ j ]  -  z  [  j  ]  -  D[j]*dx[j];  } 
=  delta*mu/y [i]  -  w[i]  -  E[i]*dy[i];  } 
(psi/phi ) *dphi ; 


*  STEP  5:  Compute  step  length  (long  steps) . 


theta  =  0.0; 

for  ( j  =0 ;  j  <n ;  j+  +  )  { 


if 

(theta  <  -dx[j]/x[j]) 

{ 

theta  = 

- dx [ j ] /x  [  j ]  ; 

} 

if 

} 

for  (i= 

(theta  <  - dz  [j]  /  z  [j]  ) 

{ 

theta  = 

-dz  [  j ] / z  [  j ]  ; 

} 

0 ;  i<m;  i++)  { 

if 

(theta  <  -  dy  [i]  /y[i]  ) 

{ 

theta  = 

-dy [i] /y  [i]  ; 

} 

if 

(theta  <  -dw[i]/w[i]) 

{ 

theta  = 

- dw [ i ] /w  [  i ]  ; 

} 

} 

theta  =  MIN (  0.95/theta,  1.0  ); 


/  'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

*  STEP  6 :  Step  to  new  point 

for  ( j  =  0 ;  j  <n ;  j+  +  )  { 

x  [  j  ]  =  x[j]  +  theta*dx  [  j  ]  ; 
z  [j]  =  z  [  j  ]  +  theta*dz  [  j  ]  ; 

} 

for  (i=0;  i<m;  i++)  { 

y  [i]  =  y[i]  +  theta*dy [i] ; 
w[i]  =  w[i]  +  theta*dw  [i]  ; 

} 

phi  =  phi  +  theta*dphi; 

psi  =  psi  +  theta*dpsi; 


Answers  to  Selected  Exercises 


I. 3:  See  Exercise  2.19. 

2.1:  (xi,x2,x3,x4)  =  (2,0, 1,0),  C  =  17. 

2.2:  (xi,x2)  =  (1,0),  C  =  2. 

2.3:  (xi,x2,x3)  =  (0,0.5, 1.5),  £  =  —3. 

2.4:  (xi,x2,x3)  =  (0, 1,0),  C  =  -3. 

2.5:  (xi,x2)  =  (2, 1),  C  =  5. 

2.6:  Infeasible. 

2.7:  Unbounded. 

2.8:  (xi,  x2)  =  (4,  8),  C  =  28. 

2.9:  (xx,  x2,  x3)  =  (1-5,  2.5, 0),  C  =  10.5. 

2.10:  (xi,  x2,  x3,  X4)  =  (0, 0, 0, 1),  C  =  9. 

2.11:  (xi2,  xi3,  £14,  x23,  x24,  x34)  =  (1, 0, 0, 1, 0, 1),  C  =  6. 

7.1:  (1)  x *  =  (2, 4, 0, 0, 0, 0,  8),  =  14.  (2)  x*  unchanged,  ^ 

x*  =  (0,8,0,0,0,10, 10),  C  =  16. 

7.2:  Aci  E  (—00, 1.2],  Ac2  E  [—1.2,  00),  Ac3  E  [—1,  9],  AC4  E  (—00,  2.8]. 

9.1:  (xi, x2)  =  (0, 5). 

9.2:  (xi,  x2,  x3,  X4,  X5,  X6,  X7,  x3)  =  (0,  6, 1, 15,  2, 1, 0, 0). 

10.5:  The  fundamental  theorem  was  proved  only  for  problems  in  standard  form. 
The  LP  here  can  be  reduced  to  standard  form. 

II. 1:  A  should  hide  a  or  b  with  probabilities  b/(a  +  b)  and  a/(a-\-b),  respectively. 
B  should  hide  a  or  b  with  equal  probability. 

11.3: 


=  12.2.  (3) 


12.1:  Slope  =  2/7,  intercept  =  1. 

12.2:  Slope  =1/2,  intercept  =  0. 

12.7:  (2)  340.  (3)  x*  is  chosen  so  that  the  number  of  months  in  which  extra  work¬ 
ers  will  be  used  is  equal  to  the  number  of  months  in  the  cycle  (12)  times  the 
inhouse  employee  cost  ($17.50)  divided  by  the  outhouse  employee  cost  ($25) 
rounded  down  to  the  nearest  integer. 

12.8:  Using  L1,  g  =  8.951.  With  L 2,  g  =  8.924. 
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ANSWERS  TO  SELECTED  EXERCISES 


13.1: 


ll 

Bonds 

Materials 

Energy 

Financial 

5.0000- 

0.0000 

1.0000 

0.0000 

0.0000 

1.9919-5.0000 

0.0000 

0.9964 

0.0036 

0.0000 

1.3826-1.9919 

0.0000 

0.9335 

0.0207 

0.0458 

0.7744-1.3826 

0.0000 

0.9310 

0.0213 

0.0477 

0.5962-0.7744 

0.0000 

0.7643 

0.0666 

0.1691 

0.4993-0.5962 

0.6371 

0.2764 

0.0023 

0.0842 

0.4659-0.4933 

0.6411 

0.2733 

0.0019 

0.0836 

0.4548-0.4659 

0.7065 

0.2060 

0.0000 

0.0875 

0.4395-0.4548 

0.7148 

0.1966 

0.0000 

0.0886 

0.2606-0.4395 

0.8136 

0.0952 

0.0000 

0.0912 

0.0810-0.2606 

0.8148 

0.0939 

0.0000 

0.0913 

0.0000-0.0810 

0.8489 

0.0590 

0.0000 

0.0922 

13.1: 


Hair 

Cosmetics 

Cash 

3.5- 

1.0 

0.0 

0.0 

1. 0-3.5 

0.7 

0.3 

0.0 

0.5-1. 0 

0.5 

0.5 

0.0 

0.0-0. 5 

0.0 

0.0 

1.0 

14.6:  The  optimal  spanning  tree  consists  of  the  following  arcs: 

{(a,  b ),  (6,  c),  (c,  /),  (/,  g),  ( d ,  5),  (d,  e),  (5,  ft,)}. 


The  solution  is  not  unique. 


17.1:  x\ 

x2 


17.2:  Let  c 


(l  +  2/i  H-  ^/l  +  4/i2^  /2, 

^1  —  2/i  +  y^I  4/i2^  /2 
2/i/  (1  —  2/i)  +  Vl  +  4/i2^  • 

cos  0.  If  c  ^  0,  then  X\  =  (c  —  2/i  +  \J c2  T"  4/i2^)  /2c,  else 


xi  =  1/2.  Formula  for  £2  is  the  same  except  that  cos  0  is  replaced  by  sin  6. 

17.3:  ma x{cTx  +  .  rj  log  x.j  +  s*  log  in*  :  Arc  <  b,  x  >  0}. 


18.1:  Using  S  =  1/10  and  r  =  9/10: 

(1)  rc  =  (545, 302, 644) /680,  y 

w  =  (131, 68) /680,  z 

(2)  rc  =  (3107, 5114, 4763)/4250, 

w  =  (2783, 6374) /4250, 

z  =  (3692, 1685, 2036)/4250. 

(3)  x  =  (443, 296)/290,  y 

w  =  (209, 197, 125)/290,  z 

(4)  x  =  (9, 12,8, 14)/10,  y  = 

W  =  (1)/ 10,  2  = 

20.1: 


=  (986, 1049)/680, 

=  (572, 815, 473)/680. 

y  =  (4016, 425)/4250, 


=  (263, 275, 347)/290, 

=  (29, 176)/290. 

(18)/10, 

(9, 6, 11, 5)  / 10. 


1 

"2 

1 

1 

-1  1 

D  = 

0 

-1  1 

1 

-1  1 

0 

L 
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20.3: 


1 

"-2 

1 

-3 

-I  1 

3 

D  = 

7 

3 

-1  1 

3 

6  1  i 

64 

7  3  1  _ 

21  _ 
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